Recall and retrieval

Recall is the moment honeycomb earns its keep: before the next turn, it hands the assistant the right context. That has to be four things at once. **Cheap**, because it cannot run a model on every query by default. **Scoped**, because it must never return a memory the requesting agent is not allo…

Recall and retrieval

Derived from the honeycomb knowledge base, captured 2026-06. Written for an external practitioner. Confirm any version-specific detail, including default shaping-stage states, against your installed version.

#The concept

Recall is the moment honeycomb earns its keep: before the next turn, it hands the assistant the right context. That has to be four things at once. Cheap, because it cannot run a model on every query by default. Scoped, because it must never return a memory the requesting agent is not allowed to see. Current, because a superseded fact must not outrank the fact that replaced it. And shaped, because the few results that reach the assistant should be the distinct, fresh, relevant ones, not five paraphrases of one fact, and not a six-month-old claim above last week's.

#How a query flows

flowchart TD
    query["Recall query"] --> lexical["Lexical arms over the memory tables (full-text or substring)"]
    query --> semantic["Semantic arms: cosine per table (optional)"]
    lexical --> fuse["Reciprocal-rank fusion plus provenance weights"]
    semantic --> fuse
    fuse --> rerank["Rerank top-k (default off)"]
    rerank --> dedup["Collapse near-duplicates (default on)"]
    dedup --> recency["Recency dampening (default neutral)"]
    recency --> budget["Token budget plus diversity (opt-in)"]
    budget --> scope["Scope filter: org and workspace partition, agent read policy"]
    scope --> out["Ranked, shaped results, with an honest degraded flag"]

#Lexical and semantic arms

Recall runs a combined query over three tables: the durable distilled facts, the per-session summaries, and the raw dialogue rows. Each arm is separately guarded, so a missing sibling table degrades that one arm to empty rather than failing the whole recall. The lexical arms use full-text search when the index is present and fall back to a substring match when it is not.

When embeddings are enabled, the query is embedded and a cosine arm runs per table, scored as a normalized cosine in the range 0 to 1. Vectors are stored as tensor columns and searched on the GPU-backed engine, so the similarity filter and the scope filter run in one query rather than against a separate vector index.

#Semantic recall is the default

A fresh signed-in user gets hybrid lexical plus 768-dimension semantic recall out of the box. Sign-in provisions and warms the local embedding model in the background, so the cosine path is what a real user hits. The system is honest about when it is degraded: recall reports degraded: false when the semantic arm actually ran, and degraded: true only on a genuine fallback, embeddings explicitly off, the model still warming, the embed worker unreachable or crashed, a per-call timeout, or a malformed response. In every degraded case recall still answers from the lexical arms. Recall never throws and never hangs on the embedding path, because a degraded answer beats an error for an agent's turn.

#Fusion: provenance-forward ranking

Recall hits carry a real, comparable score, and results are ordered by relevance, never by arm order and never by a client-side fabrication. The per-arm ranked lists are blended with Reciprocal Rank Fusion, which is scale-free and needs no calibration between the lexical and semantic score scales. Two shaping rules ride the fusion:

  • Provenance weights fold source quality into the rank: distilled summaries weight higher than raw session rows, so a raw tool-call blob needs a materially stronger signal to outrank a clean distilled fact. Distilled facts above raw dumps is the product-correct order.
  • Identity dedup collapses the same source-plus-id across arms, and every hit keeps its source and scope provenance.

A note for practitioners: the storage backend ships a native hybrid operator that fuses vector and full-text in one statement. honeycomb deliberately does not use it, because measured evaluation found it did not beat the in-house fusion. "Hybrid" here means SQL for structure plus vector for similarity, fused in honeycomb's own reciprocal-rank step, not the backend's native operator. Treat that as a settled decision unless your installed version's documentation says otherwise.

#The shaping stages

Above the fusion floor, recall runs four shaping stages in a fixed order, each wired into the live pipeline behind an honest default that was measured (or measured neutral) on a committed evaluation set. The defaults are deliberately conservative: ship the behavior that measurably helps, and leave the rest opt-in.

Stage Default What it does
Reranker off (fusion order unchanged) Re-scores the top-k by raw cosine of the query against candidate embeddings. Real and wired, dormant by default after a measured near-zero lift. Timeout-budgeted; on timeout it keeps the prior order.
Semantic dedup on Collapses near-duplicate hits whose embeddings exceed a similarity threshold, keeping the highest-provenance copy (memory over summary over session). Fails soft to the un-deduped list.
Recency dampening neutral (near-infinite half-life) A multiplicative age decay on the fused score; demotes stale rows, never a hard cutoff, never drops a row by age. Neutral until a caller tunes it.
Token budget and diversity opt-in (engages on a positive token budget) Fills a token budget with a maximal-marginal-relevance selection, trading a little pure relevance for diversity. With no budget, the unchanged top-k path runs.

Default stage states can change between versions; confirm them against your installed version.

#The authorization boundary

Recall is where scoping has to be exactly right, because the candidate channels (full-text, vector, graph traversal, hints) cast a wide net. The defense is ordering: those channels produce memory identifiers only, and the scope filter authorizes candidates before any content loads. Every content-bearing stage that follows (reranking, summaries, transcript expansion, access tracking) runs only on the authorized set. A strong vector hit or a high-degree entity can surface an identifier, but it cannot leak content past the read policy. The outer ring (organization and workspace) is enforced at the storage partition beneath this, so even a buggy inner clause cannot cross a workspace boundary. The full scope model is in the security model.

#Currentness

Superseded facts are kept off the result set by the append-only model itself: a soft-delete flag and a superseded status exclude stale versions at query time, and readers resolve by the highest version, so a newer fact in the same slot outranks the one it replaced. Recency dampening is a soft freshness signal layered on top of this hard version invariant; the two are complementary, not redundant.

#The browse surface

Beyond scored recall, agents can browse memory as a virtual filesystem: ordinary shell commands against a memory mount, intercepted and routed to scoped queries. From the agent's point of view it is browsing files; underneath, each operation is a query against the session and memory tables. This is the explicit, agent-driven recall that bypasses the inject-on-confidence rule. Either way, scored recall or browse, the same authorization boundary applies before any content is returned. The browse mechanics are covered in data and storage.

#How recall is measured

Every ranking change is provable on a committed evaluation set, not asserted. A harness scores a hand-curated set of query-to-expected-memory pairs (deliberately including pairs with no surface-token overlap, so the set exercises the semantic lift) on recall at k, mean reciprocal rank, and a position-discounted graded-relevance metric. A committed baseline is enforced: a change that regresses it fails. This is what lets the "semantic on by default" posture and each shaping default be defended by measurement rather than by claim.