Capture and memory

Derived from the honeycomb knowledge base, captured 2026-06. Written for an external practitioner. Confirm any version-specific detail, including default flag states, against your installed version.

#The concept

A memory starts life as raw text: a prompt, a tool call, a response. On its own that is searchable but dumb. The capture-and-memory path turns it into something the retrieval layer can reason over, discrete facts with confidence scores, entities and relationships, and hints about what questions the memory could answer later. The one rule that never bends is that a slow or failing model must never cost you a memory. The raw content is committed first; everything after that is enrichment that runs asynchronously, off the write path.

#Capture: from a session event to a row

Every assistant fires lifecycle events. honeycomb's per-assistant hooks are thin clients: when an event fires, the hook reads the credential, normalizes the assistant's native payload into the shape the daemon expects, and makes a local request to the daemon. The daemon writes one row per event into the raw sessions table. The hook builds no SQL, holds no storage handle, and decides no scope; it states what happened and lets the daemon persist it.

Three event types are captured:

Prompt events record the user's prompt text.
Tool-call events record the tool name, its input, and its response.
Assistant-response events record the assistant's last message.

Each request carries session metadata (session id, working directory, permission mode, the native event name, and the agent_id) and an optional message embedding. Capture is append-only: readers reconstruct a session by concatenating its rows in order.

#Capture opt-out

Setting HONEYCOMB_CAPTURE=false places honeycomb in read-only mode for sensitive workflows. In that mode the capture hooks still run but skip asking the daemon to write any trace data, and the table-ensure step is skipped. Recall and search still work. This is a per-session escape hatch for working with credentials, PII-heavy files, or regulated data.

#The pipeline: from a raw memory to a distilled fact

Once a raw memory is written, the daemon makes it smart. The work runs as durable jobs with a lease, complete, fail, and dead-letter lifecycle, exponential backoff, and a reaper for stale leases. Jobs survive a daemon restart.

flowchart TD
    capture["Raw memory written"] --> extract["Extraction (model): facts plus entity triples"]
    extract --> decide["Decision (model): add / update / delete / none"]
    decide --> writes["Controlled writes to the distilled-memory table"]
    writes --> graph["Graph persistence (separate write)"]
    graph --> hints["Prospective hints (model)"]
    hints --> done["Done"]

#Extraction

The extraction worker leases a job and asks the model to decompose the memory into facts (each with content, a type, and a confidence between 0 and 1) and entities (triples of source, relationship, target). Input is capped and output is bounded (roughly 20 facts and 50 entities, with per-fact length limits). Invalid fields are logged and dropped rather than failing the whole job.

#Decision

For each extracted fact, the decision stage runs a hybrid search for the few existing candidates and asks the model what to do: add, update, delete, or none, with a target memory, a confidence, and a reason. With no candidates it proposes an immediate add without a model call. Every proposal, applied or not, is recorded to an audit history. That history is what makes shadow mode and audits possible.

#Controlled writes

This is the only stage that mutates the distilled-memory table. Embeddings are prefetched before the write so no network call happens while committing. An add proposal must clear a minimum fact confidence (default 0.7), have non-empty normalized content, and not collide with an existing content hash (a content-hash check returns the existing memory rather than inserting a duplicate). Updates and deletes run a contradiction check, are flagged for review, and apply only when explicitly allowed, landing as append-only, version-bumped writes rather than in-place edits.

#Graph persistence and prospective hints

After the memory write commits, graph structure is written separately: entities upsert by canonical name, relationships by their triple, and mention links insert-or-ignore so reprocessing is idempotent. A failure here logs a warning and does not revert the facts already written, because the facts matter more than the edges. Finally, if hints are enabled, a pass generates hypothetical future queries the memory would answer and indexes them, so retrieval can match a query against the hint, not only the literal text.

#Default posture: nothing surprises you with model spend

The pipeline worker is constructed and started on every daemon boot, but the stage handlers default off by design, so no model spend happens without an explicit opt-in. Stages are enabled individually through HONEYCOMB_PIPELINE_* environment variables (or the equivalent agent.yaml flags).

Flag	Effect
`enabled`	Master switch. Off means no extraction jobs are processed.
`shadowMode`	Run extraction and decision but write nothing; proposals are logged to history.
`mutationsFrozen`	Emergency read-only brake; supersedes shadow mode.
`graph.enabled`	Enable graph reads, traversal, and recall boosting.
`graph.extractionWritesEnabled`	Let background extraction persist entity triples.
`autonomous.enabled`	Allow scheduled maintenance and retention.
`autonomous.frozen`	Hard stop on maintenance even when autonomous is enabled.
`hints.enabled`	Run prospective-hint generation at write time.

Default flag states can change between versions; confirm the defaults that ship with your installed version.

#Embeddings

When embeddings are enabled (the default), captured turns and deliberately stored memories land with a real 768-dimension vector from a local embedding model (nomic-embed-text-v1.5), downloaded once and warmed in the background. The vector dimension is locked end to end against the storage columns and the model output; a vector of the wrong dimension is rejected rather than silently written. Turning embeddings off (HONEYCOMB_EMBEDDINGS=false) makes recall use its lexical fallback. The retrieval side of this is covered in recall and retrieval.

#The other workers

Beyond the write-path stages, the daemon runs background workers on their own schedules: a document worker that ingests URLs and files, a retention worker that runs batch-limited purges, a maintenance worker that runs diagnostics and either logs recommendations or executes repairs, a summary worker that writes the canonical transcript and summary at session end, and a synthesis worker that regenerates a rebuildable MEMORY.md projection from durable memories and the session ledger.

#Where to read next

Recall and retrieval: how distilled memory is found and shaped.
The knowledge graph: the ontology these writes feed.
Data and storage: the table catalog.
Harness integrations: the hooks that feed capture.