Doctor architecture

The technical picture of Doctor for builders and auditors: the supervision model, the repair ladder, health classification, and the trust posture. Grounded in the Doctor technical manual.

Doctor architecture

The technical picture of Doctor for builders and auditors: the supervision model, the repair ladder, health classification, and the trust posture. Grounded in the Doctor technical manual.

Related:


#Overview

Doctor is a zero-dependency watchdog. Its package declares no runtime dependencies; only the platform's own building blocks are used. HTTP probing uses the built-in HTTP client, telemetry reads use the built-in read-only SQLite, and shell-outs use argument arrays, never a shell. The build tooling never ships.

#Supervision model

The operating system supervises Doctor, and Doctor supervises everything else. The OS service manager restarts Doctor on a crash and starts it on boot. Doctor deliberately has no "restart myself" code path, because a self-restart would put the watchdog inside a failure it is supposed to stand outside of.

#Multi-daemon registry

Doctor reads a static registry of daemons and spawns one fully independent supervisor per entry, each with its own probe, backoff, ladder, state, and incident record. A daemon that is down is still supervised, because "should exist" survives independently of "is running." A missing registry falls back to the memory daemon as the primary. A malformed registry does not crash-loop; it falls back, logs the fallback, and records a needs-attention banner.

#Health classification

One HTTP request resolves exactly one of four kinds: healthy, degraded with per-subsystem reasons, unreachable because the connection was refused, or unreachable because it timed out. The refused-versus-timeout distinction, down versus wedged, drives a targeted repair rather than a blind restart. The probe never throws.

#The repair ladder

When a daemon is sick, Doctor climbs:

  1. Restart it.
  2. If restarts keep failing, reinstall it.
  3. If a conflicting global package is detected, remove it.
  4. Escalate.

Backoff between rungs is geometric, with a floor and a ceiling, and the ladder stops the instant health returns. Escalation is the terminal hand-off, not a rung: it builds a record with the diagnosis, the steps tried, and the recommended action, plus, for any deferred action, a note of what it would have done.

#The blessed-update gate

Doctor auto-updates the memory daemon only behind a gate: a version must be explicitly approved for rollout, the update is verified healthy afterward, and a failed verify rolls back to the last working version. A bad release cannot spread itself. Doctor never auto-updates its own package; a single explicit command is the only way that happens.

#Ports and the status page

Doctor serves one HTTP listener on 127.0.0.1:3852, with the human-readable status page, the machine-readable status feed, and the live health stream. It is read-only by construction: no route mutates, proxies, or triggers an action, and nothing binds to a public address. The daemons it watches sit on their own ports: the memory daemon on 3850, its embeddings child on 3851, the portal on 3853, and the codebase daemon on 3854.

#Telemetry, single source of truth

Each service writes non-sensitive telemetry to its own local database. Doctor polls those read-only, merges them with health into one authoritative in-memory picture of the fleet, and feeds exactly one stream to the portal. Anything that leaves the machine passes a single chokepoint with allow-list scrubbing and layered opt-out gates.

#Defaults

Probe every 30 seconds, a 2-second per-probe timeout, a 60-second startup grace so a booting daemon is not judged dead, give up on restarts after three consecutive failures, a 5-second post-restart cooldown, and a backoff floor of 1 second and ceiling of 30 seconds.

#Credential safety

There is no code path in Doctor that reads, writes, or deletes the credentials file. A suspected credential fault is escalated with a recommendation, never automated, and there is deliberately no command to clear credentials.

#License

Released under the GNU Affero General Public License, version 3.0 or later.