Data and storage
All of honeycomb's durable state lives in tables on a GPU-backed SQL and vector store. The daemon is the only process that opens that store; everything else reaches it through the daemon. The storage layer has a few unusual properties that shape every table and every write pattern, so it pays to …
Data and storage
Derived from the honeycomb knowledge base, captured 2026-06. Written for an external practitioner. The DDL shapes shown are the logical table shapes from the knowledge base; the runtime source of truth is the daemon's schema module, which the lazy heal pass converges every table toward. Confirm exact columns against your installed version.
#The concept
All of honeycomb's durable state lives in tables on a GPU-backed SQL and vector store. The daemon is the only process that opens that store; everything else reaches it through the daemon. The storage layer has a few unusual properties that shape every table and every write pattern, so it pays to understand them before reading the catalog.
#Storage properties a practitioner must know
- Lazy schema healing. Tables and columns are created on first write, not through an upfront migration. A new column added with a safe default is filled in on the next heal pass, so adding a field does not require a migration step ahead of the worker that writes it. Schema changes are additive.
- No parameterized queries. The query endpoint takes no bound parameters, so the daemon builds SQL by string composition and escapes every value itself through dedicated helpers. This is why all SQL construction lives in one place (the daemon) and never in a client.
- Append-only, version-bumped writes. The backend coalesces updates in a way that can silently drop concurrent edits, so honeycomb does not lean on naive in-place updates for hot tables. The current state of a versioned row is its highest version; a change appends a new version rather than mutating the old one.
- Select-before-insert with drift detection. Writes that must be unique check for an existing row first and re-verify after, making concurrent-writer races observable rather than silent, because the backend has no server-side unique constraint to lean on.
- Tenant isolation at the storage layer. Organization and workspace isolation is enforced at the storage partition, so two workspaces never share a row, partition, or index. Most tables therefore do not need explicit tenancy columns; a few cross-cutting tables carry explicit organization and workspace ids.
#The three "memory" tables
Three tables are easy to confuse because they all hold something called memory. Fix them first.
| Table | Holds | Written by |
|---|---|---|
sessions |
The raw capture stream, one row per event | Capture |
memories |
The distilled engine output, the facts the pipeline decided to keep | The pipeline |
memory |
Wiki summaries and the virtual-filesystem file rows | The summary worker |
Capture writes sessions; the pipeline reads sessions and writes memories; the summary worker writes memory.
flowchart LR
sessions["sessions (raw events)"] --> pipeline["pipeline"]
pipeline --> memories["memories (distilled facts)"]
pipeline --> entities["entities plus ontology"]
sessions --> summary["summary worker"]
summary --> memory["memory (wiki plus browse)"]
memories --> skills["skillify -> skills"]sessionsholds one row per prompt, tool call, or response. Its message body is structured JSON, with an optional vector. Rows are append-only inserts; readers concatenate by path in time order.memoryholds wiki summaries and browse-surface file rows. It is update-or-insert keyed by path and carries a one-line key for fast session priming.memoriesis the engine's distilled output, with confidence, importance, provenance, a dedup hash, a soft-delete flag, and scope columns. It is the table recall ranks over. Each row carries a durable one-sentence key written at distillation time so the session-priming digest can skim durable keys with a pure SQL select and no generation at read time.
#The distilled-memory schema (illustrative)
CREATE TABLE IF NOT EXISTS "memories" (
id TEXT NOT NULL DEFAULT '',
type TEXT NOT NULL DEFAULT 'fact',
content TEXT NOT NULL DEFAULT '',
key TEXT NOT NULL DEFAULT '',
normalized_content TEXT NOT NULL DEFAULT '',
content_hash TEXT NOT NULL DEFAULT '',
confidence FLOAT4 NOT NULL DEFAULT 1.0,
importance FLOAT4 NOT NULL DEFAULT 0.5,
tags TEXT NOT NULL DEFAULT '[]',
project TEXT NOT NULL DEFAULT '',
project_id TEXT NOT NULL DEFAULT '',
source_id TEXT NOT NULL DEFAULT '',
source_type TEXT NOT NULL DEFAULT '',
pinned BIGINT NOT NULL DEFAULT 0,
is_deleted BIGINT NOT NULL DEFAULT 0,
agent_id TEXT NOT NULL DEFAULT 'default',
visibility TEXT NOT NULL DEFAULT 'global',
content_embedding FLOAT4[],
created_at TEXT NOT NULL DEFAULT '',
updated_at TEXT NOT NULL DEFAULT ''
) USING deeplake;The key column is additive and heal-compatible; a row with no derived key falls back to its content at read time, so a legacy un-keyed row is still primeable.
#The rest of the catalog
| Group | Tables | What they hold |
|---|---|---|
| Engine support | memory_history, memory_jobs, embeddings |
The audit trail of every proposal, the durable distillation job queue, and the vectors mirrored for GPU search. |
| Knowledge graph | entities, entity_aspects, entity_attributes, entity_dependencies, memory_entity_mentions, epistemic_assertions, ontology_proposals |
The ontology, with supersession by appended attribute version. |
| Sources and documents | memory_artifacts, documents, document_memories, connectors |
Source-backed rows keyed by source id, the ingest lifecycle, the document-to-chunk join, and external-connector sync cursors. |
| Product tables | skills, rules, goals, kpis, codebase |
Mined skill versions, org-wide rules, goals and KPIs, and codebase-graph snapshots. |
| Tenancy and auth | agents, api_keys, projects, synced_assets |
The within-workspace agent roster and read policies, hashed connector keys, the per-workspace project registry, and the team asset-sync substrate. |
| Telemetry | (opt-in counters and an optional recall-quality ledger) | Usage counters and diagnostics; never carries secrets or request bodies. |
Skills and rules are append-only and version-bumped (the current state for a logical key is the highest version). Goals and KPIs are update-or-insert by their logical key. Snapshots in codebase are one row per repository-checkout identity, deduped by a content hash.
#Per-project scoping
Tenancy has a third, soft ring inside a workspace: the project. A projects registry records the projects a folder can bind to. Memory and skills carry a resolved project id that the scope clause segments on, defaulting to a reserved per-workspace inbox so a capture is never dropped when no project resolves. A project is a registry-backed identity, not a repository id; a canonical git remote is only an optional auto-bind signal. Cross-project sharing of a skill is an explicit, auditable opt-in recorded directly on the row.
#The memory virtual filesystem
honeycomb presents the team-shared database as an ordinary directory and intercepts the shell commands that touch that mount, so an assistant browses memory with cat, ls, grep, and find while every operation is really a scoped query. No real files exist at these paths: every read hits an in-memory cache, a pending-write buffer, or a query, and every write is buffered and flushed on a timer.
Three things the intercept hides from the agent:
- Write batching. A read immediately after a write reads from the pending buffer, so the agent sees its own write even before it reaches storage.
- The multi-row session layout. A session "file" is dozens of rows concatenated transparently. Session files are read-only at this layer; attempts to write, append, remove, copy, or move them are rejected, because they are an append-only event log owned by capture.
- The structured goals and KPIs tables. Goals and KPIs appear as plain markdown files, so an agent manages objectives with file operations while the CLI reads the same state from typed columns. Goal lifecycle is expressed through file verbs: removing a goal file is a soft close (status flipped, the row preserved for the audit trail), and moving a goal between status folders is a status transition that may change only the status component.
A synthesized index file at the mount root lists the most recent summaries and sessions, and a synthesized subtree renders the codebase-graph queries from the local snapshot. The same browse view is produced by both the long-lived shell object and the stateless pre-tool hook, sharing one renderer so they never disagree.
#Retention
Because the backend exposes no transactions at this layer, retention runs as batched, idempotent sweeps in a daemon worker rather than cascading deletes.
| Data | Default behavior |
|---|---|
sessions raw events |
Pruned by the sessions-prune operation; summaries retained in memory |
memories |
Soft-delete window before purge; history retained longer |
memory_jobs |
Completed jobs purged after a window; dead jobs later |
memory_artifacts |
Soft-delete on source-file removal, hard purge on source disconnect by source id |
skills / rules |
Append-only version history retained |
| Embeddings / vectors | Purged with their owning row during retention sweeps |
#Where to read next
- Capture and memory: the pipeline that writes these tables.
- Recall and retrieval: how recall ranks over them.
- The knowledge graph: the ontology and codebase tables.
- Security model: how rows stay in their tenant lane.