Context Graphs: The Missing Memory Layer for AI
Vector search finds similar things. Context graphs find relevant, connected, contextual things. That distinction changes everything.
Every time you call an LLM, it wakes up with amnesia. No memory of what it did five seconds ago, no awareness of the twenty documents that matter to your question, no understanding of how those documents relate to each other. The context window is all it has — and right now, most systems fill that window badly.
The dominant approach, Retrieval-Augmented Generation (RAG), works like this: embed a query, find the K most similar text chunks, stuff them into the prompt. It’s simple. It works well enough for FAQ-style questions. But it falls apart the moment you need the AI to reason across multiple connected pieces of information.
Ask a standard RAG system: “What impact did our Q3 product launch have on revenue, and how does that compare to the projections Alice made in her June forecast?” You need the launch docs, the revenue data, Alice’s forecast, and the relationships between them. Vector similarity alone can’t reconstruct that web.
This is where context graphs come in.
What Exactly Is a Context Graph?
A graph, at its simplest, is a set of nodes (things) connected by edges (relationships).
[Alice] --authored--> [Q3 Report]
[Q3 Report] --discusses--> [Revenue Growth]
[Revenue Growth] --driven_by--> [Product Launch]
[Product Launch] --led_by--> [Alice]
A knowledge graph stores facts this way. Google’s Knowledge Graph, Wikidata, and enterprise knowledge bases all use the pattern: subject-predicate-object triples representing static truths about the world.
A context graph takes this foundation and adds three things that change everything:
Relevance is first-class. Every node and edge carries a relevance score that shifts based on who’s asking, what they’re doing, and when. The fact that Earth orbits the Sun is always true, but it’s not always relevant.
Time is built in. Context isn’t static — it evolves. A context graph tracks when information was created, when it was last accessed, and how its relevance decays or grows over time.
Scope is dynamic. You don’t query the whole graph. You produce views — focused subgraphs tailored to a specific question at a specific moment.
The result is a data structure that doesn’t just store what’s true — it stores what’s relevant right now for this particular purpose.
The Three-Layer Mental Model
Think of a context graph as having three concentric layers:
At the bottom sits your total knowledge — everything the system has ever ingested. Millions of nodes, mostly dormant. Above that is accessible context — things that could become relevant given the current situation. And at the top is active context — the small, focused subgraph that actually matters for this specific query.
When a question arrives, the system identifies seed nodes in the total knowledge layer, traverses outward through accessible context scoring relevance as it goes, and produces an active context subgraph — the minimal, maximally relevant view. This becomes what the LLM sees.
Why Relevance Scoring Is the Secret Sauce
The thing that separates a context graph from a plain knowledge graph is how it decides what matters. Relevance isn’t a single number. It’s a composite of signals, each capturing a different dimension of “why this information matters right now.”
Semantic similarity measures how closely a node’s content matches the query, computed via embedding cosine similarity. This is what vector search already does. It’s necessary but not sufficient.
Structural proximity measures distance in the graph. Nodes one hop away from your query’s focal point are almost always more relevant than nodes five hops away. But edge weights matter — a critical dependency four hops out might trump a tangential neighbor.
Temporal recency captures freshness. Information decays. Conversational context has a half-life of minutes. Project context decays over weeks. Foundational facts persist indefinitely. A good context graph applies different decay curves to different types of information automatically.
Frequency captures reinforcement. Paths that get traversed often become stronger, like neural pathways. The graph learns from usage patterns — an emergent form of memory.
Authority captures trust. Primary sources outweigh secondary ones. Verified information outweighs auto-extracted guesses. Expert-authored content outweighs hearsay.
A practical relevance function blends all five:
relevance(node, query, time) =
α · semantic_similarity
+ β · structural_proximity
+ γ · temporal_recency
+ δ · access_frequency
+ ε · authority_score
The weights (α, β, γ, δ, ε) shift depending on the use case. Conversational AI leans heavy on semantic match and recency. Research and analysis emphasizes structural depth and authority. Real-time decision-making prioritizes recency and frequency above all else.
Retrieval: From Graph to Context Window
The core operation of a context graph system is retrieval — given a query, produce the right subgraph.
The most common approach is seed-and-expand: use semantic search to find seed nodes, expand outward N hops scoring relevance at each step, prune anything below a threshold. What you get is a connected subgraph, not a bag of isolated chunks. The relationships between pieces of information survive the retrieval process.
For complex questions, query decomposition works well. An LLM breaks the question into sub-queries, each one seeds its own traversal, and the resulting subgraphs get merged. Nodes that appear in multiple subgraphs — the intersection points — receive a relevance boost, because information that’s relevant from multiple angles is almost always important.
Then comes the packing problem. LLM context windows have a token budget, and you need to spend it wisely. The most effective pattern is hierarchical: include the top nodes by relevance in full, the next tier as compressed summaries, and everything else as reference titles. Preserve the graph structure in some form — even a simple triple notation — because the relationships are often more valuable than the content of any individual node.
Context Graphs vs. Vector RAG
This isn’t a competition where one replaces the other. It’s about understanding what each does well.
Vector stores excel at broad semantic recall. “Find me everything that sounds like this query.” They’re fast, simple, and good enough for many use cases.
Context graphs excel at structured, multi-hop, temporally-aware retrieval. “Find me the connected web of information that’s relevant to this question, considering what’s recent, what’s authoritative, and how things relate to each other.”
The practical sweet spot is using both. Vector search for the initial broad sweep, graph traversal for structured depth. The vector search finds your seed nodes; the graph traversal discovers the context those seed nodes live within.
Where context graphs show clear advantages:
Multi-hop reasoning. “Who has context on the topics discussed in reports authored by people on Alice’s team?” This requires traversing several edge types in sequence — team membership, authorship, topic extraction. Vector search can’t express this.
Temporal awareness. “What was our understanding of this issue before the latest update?” A context graph with temporal edges can reconstruct past states. A vector store treats everything as equally current.
Coherence. RAG retrieves a flat list of chunks. A context graph retrieves a connected subgraph where relationships are preserved. The LLM receives not just facts but the structure connecting them.
Learning from use. Every query strengthens relevant paths and decays unused ones. Over time, the graph becomes better at surfacing what matters for your specific workflows.
Building One: What It Takes
The implementation story is surprisingly approachable if you sequence it right.
Start with the data model. Define your node types — entities, documents, concepts, events — and your edge types. Use a property graph model where both nodes and edges carry key-value metadata. Store everything in adjacency lists (not matrices), because context graphs are sparse.
Add vector search early. Attach embeddings to your nodes and index them with HNSW for approximate nearest neighbor search. This gives you the semantic similarity signal and the ability to find seed nodes for traversal.
Implement basic traversal and scoring. Breadth-first expansion from seed nodes, with edge weights controlling which paths to follow. Even a simple relevance function (similarity × proximity) produces surprisingly good results.
Then layer in temporal decay. Track creation time, last access time, and access count on every node and edge. Apply exponential decay to recency and logarithmic scaling to frequency. Different edge types get different decay rates.
Build the serialization layer. This is where many systems underinvest. How you present graph context to the LLM matters enormously. Structured natural language with clear relationship markers tends to outperform both raw triples and dense JSON.
Add persistence and concurrency last. For real-time AI workloads, in-memory is the right default. Add write-ahead logging for durability, snapshots for recovery, and read-write locks for concurrency. Only move to distributed architecture when a single machine’s RAM isn’t enough.
The Feedback Loop
The most powerful property of a context graph is that it can learn.
When the LLM produces a response and the user accepts it, boost the relevance of the context nodes that contributed to it. When the user rejects a response or has to ask a follow-up, reduce the scores. When the user explicitly references something, that’s a strong reinforcement signal. When information goes unaccessed for weeks, let it decay gracefully.
Over time, this creates a virtuous cycle: better context produces better responses, which generates clearer feedback signals, which further improves context selection. The graph becomes tuned to its users’ actual information needs rather than theoretical importance.
Where This Is Going
The trajectory of AI memory is clear: bigger context windows are necessary but not sufficient. A million-token window doesn’t help if 900,000 of those tokens are irrelevant. The bottleneck isn’t capacity — it’s selection.
Context graphs are the selection layer. They sit between the vast ocean of potentially relevant information and the focused beam of what the LLM actually needs to see. They make AI systems that don’t just retrieve similar text, but reconstruct the web of relevant, connected, temporally-aware context that enables real reasoning.
The future of AI memory isn’t bigger windows. It’s smarter context selection. And that’s a graph problem.
If you’re building AI systems that need to reason across large, evolving bodies of information — not just answer FAQ-style questions — context graphs are worth your attention. The gap between “find similar chunks” and “reconstruct relevant context” is where the next generation of AI applications will be won.