The knowledge graph

honeycomb keeps two graphs, and they answer different questions. The **memory knowledge graph** (the ontology) captures what was learned: the entities, claims, and relationships distilled from your sessions. The **codebase graph** captures how your code is actually wired: files, symbols, and the …

The knowledge graph

Derived from the honeycomb knowledge base, captured 2026-06. Written for an external practitioner. Confirm any version-specific detail against your installed version.

#The concept

honeycomb keeps two graphs, and they answer different questions. The memory knowledge graph (the ontology) captures what was learned: the entities, claims, and relationships distilled from your sessions. The codebase graph captures how your code is actually wired: files, symbols, and the edges between them, extracted straight from source. Recall over raw traces tells an agent what was discussed; these graphs tell it what is true and how the code connects.


#Part one: the memory ontology

The pipeline that distills memories also writes graph structure. The ontology is a set of related concepts:

  • Entities are canonical things (a service, a person, a convention), keyed by canonical name and carrying a type and an agent scope.
  • Aspects are weighted dimensions of an entity.
  • Attributes are claim values about an entity. Each attribute carries a kind, a status, a claim key that names the slot it fills, a group key, and a version.
  • Dependencies are audited edges between entities, each with a type, a strength, a confidence, and a required reason for loose links.
  • Mentions join a memory to the entities it references.
  • Assertions record the epistemic act: who claimed, believed, observed, decided, preferred, denied, or questioned something.
  • Proposals are the audited control plane for changes to the ontology.

#Supersession instead of mutation

The storage backend cannot safely update a row in place, so the ontology never mutates a claim. When a claim changes, a new attribute version is appended and the prior one is marked superseded. Readers resolve a claim slot by its highest version, so the current value wins and the history is preserved. This is the same append-only discipline the rest of honeycomb uses, and it is why recall can keep a stale claim off the result set with a pure version comparison rather than a destructive edit.

#How the ontology earns its place in recall

Graph traversal is one of recall's candidate channels: a high-degree entity can surface related memory identifiers. As with every other channel, it produces identifiers only, and the scope filter authorizes them before any content loads, so a strong graph hit can never leak content past an agent's read policy. Graph reads, traversal, and recall boosting are gated by the pipeline's graph flags, so an operator can run the engine with or without graph influence.


#Part two: the codebase graph

The codebase graph subsystem extracts files, symbols, and relationships directly from source, so an agent can ask "who calls this function", "what is the blast radius of changing this symbol", or "walk me through this subsystem" and get answers grounded in the current checkout rather than in prose.

The output mirrors the NetworkX node-link JSON format (a directed multigraph), so any tool that understands NetworkX graphs can consume a snapshot. The feature is AST-only: it uses tree-sitter parsers, never a language server, a type checker, or an LLM, which keeps builds fast and deterministic. Nine languages are supported: TypeScript, JavaScript, Python, Go, Rust, Java, Ruby, C, and C++.

#The build pipeline

A build walks the repository, extracts every supported source file, aggregates one snapshot, and writes it to disk. Source discovery prefers git's own ignore engine so it honors .gitignore exactly, with a manual walk as a fallback when git is unavailable. Each file is content-hashed and looked up in a per-file cache before extraction, so a rebuild after a one-file change takes tens of milliseconds rather than seconds. Extraction routes each file to a language-appropriate extractor that produces a uniform shape, which keeps the snapshot builder language-agnostic.

#The node and edge model

A node represents one code construct, with an id formatted <file>:<symbol>:<kind>.

Node field Meaning
id Unique key within a snapshot
label Display name
kind function, class, method, interface, type_alias, enum, const, variable, or module
source_file Repository-relative path
source_location A line or line range
language One of the nine supported languages
exported Whether the symbol is exported
fan_in, fan_out, is_entrypoint Derived after cross-file resolution

Edges are directed and typed. The relation is one of imports, calls, extends, implements, or method_of, and each carries a confidence (current edges are almost entirely concrete AST facts).

#Cross-file resolution is high-confidence only

After every file is extracted, three passes turn per-file placeholders into real cross-file edges, and ambiguous cases are dropped rather than guessed. The calls pass resolves a call only when it matches a named or namespace import whose export exists in a resolvable local file; default imports, bare package specifiers, path aliases, barrel re-exports, instance dispatch, and dynamic imports are deliberately skipped. The imports pass repoints an import edge to the real module when the specifier resolves to a known repository file and keeps an external: marker otherwise, so "our code versus a dependency" stays distinguishable. The heritage pass resolves extends and implements to a same-file or named-import base type.

This is an honest limitation worth surfacing to consumers: because cross-file calls are resolved only for relative named and namespace imports, a symbol reading "incoming (0)" is not proof of dead code. A caller may reach it through an unresolved import path.

#Deterministic, content-addressed snapshots

A snapshot is canonicalized before it is hashed or written: nodes and edges are sorted, and the JSON is serialized with sorted keys and no inserted whitespace, so the same code always serializes to the same bytes. The content hash covers only the stable graph fields and deliberately excludes volatile observation metadata (timestamp, branch, worktree, generator version), so two builds of identical code on different worktrees or at different times produce the same hash and dedup correctly. Snapshots are written atomically, so a crash leaves either the old file or the new one, never a partial.

#Cloud sync and the query surface

A successful build best-effort pushes the snapshot to the cloud when you are authenticated; the local snapshot is the source of truth, and a push failure never blocks the build. The push uses a select-before-insert with drift detection: an identical hash is a no-op, a different hash for the same commit logs a drift warning and refuses to overwrite (because the same commit producing different content means extractor drift a human should investigate), and a missing row inserts. A teammate can pull the freshest snapshot for the current HEAD.

Agents read the graph through a synthesized query surface that renders text on the fly from the local snapshot:

Query Returns
Overview Commit, node and edge counts, kind breakdowns, top files, limitations
Find Substring and fuzzy search on node id and label
Show Full node detail plus incoming and outgoing edges by relation
Impact Transitive dependents (blast radius) of a symbol
Neighborhood Symbols in a file plus their cross-file neighbors
Layers Architectural subsystem grouping by path heuristic
Tour A deterministic dependency-ordered walkthrough
Path The shortest path between two symbol patterns

The renderers carry an honest caveat: a snapshot whose source files have been edited since the build is stale and should be cross-checked against live source.