Files
Vetting/docs/database.md
T
josh 8367ec2a9f
CI / Lint + build + test (push) Successful in 1m36s
Release / detect (push) Successful in 5s
Release / build-live-image (push) Has been skipped
Release / bundle (push) Successful in 49s
docs: comprehensive documentation expansion
Add 4 new doc files (configuration reference, development guide, API
reference with full request/response schemas, database schema), expand
the README with a feature list and how-it-works walkthrough, fix
missing Firmware and Burn stages in architecture.md and test-suite.md,
add threshold engine and host-mode agent sections, and add godoc
comments to 11 packages and 6 model types.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-23 18:37:26 -04:00

12 KiB

Database schema

The orchestrator uses SQLite via modernc.org/sqlite — a pure Go driver with no cgo dependency. The database file is created on first startup at the path in database.path (default ./var/vetting.db).

Pragmas set at open time:

  • PRAGMA journal_mode = WAL — write-ahead logging for concurrent readers.
  • PRAGMA foreign_keys = ON — enforced referential integrity.

Migrations are embedded via go:embed in internal/db/ and applied in filename order at startup. A schema_migrations table tracks which migrations have run.


Tables

hosts

Registered hardware nodes in the vetting cluster.

Column Type Constraints Default Description
id INTEGER PK AUTOINCREMENT
name TEXT NOT NULL UNIQUE Human-readable host name.
mac TEXT NOT NULL UNIQUE Lowercase colon form (e.g. aa:bb:cc:dd:ee:ff).
wol_broadcast_ip TEXT NOT NULL LAN broadcast IP for Wake-on-LAN magic packets.
wol_port INTEGER NOT NULL 9 WoL UDP port.
expected_spec_yaml TEXT NOT NULL YAML describing expected hardware (CPU, memory, disks, firmware).
pdu_config_json TEXT PDU power control config (future use).
ipmi_config_json TEXT IPMI config (future use).
notes TEXT NOT NULL '' Operator notes.
created_at TIMESTAMP NOT NULL CURRENT_TIMESTAMP
updated_at TIMESTAMP NOT NULL CURRENT_TIMESTAMP
last_seen_at TIMESTAMP Host-mode agent heartbeat timestamp. NULL = never seen.

runs

Vetting run instances. Each run belongs to one host and walks through the state machine.

Column Type Constraints Default Description
id INTEGER PK AUTOINCREMENT
host_id INTEGER NOT NULL FK → hosts(id) CASCADE
state TEXT NOT NULL Current RunState (see internal/model).
result TEXT pass or fail once terminal.
failed_stage TEXT Stage name that halted the pipeline.
next_boot_target TEXT linux, memtest, etc. (future use).
agent_token_hash TEXT NOT NULL SHA-256 hash of the bearer token.
started_at TIMESTAMP NOT NULL CURRENT_TIMESTAMP
completed_at TIMESTAMP Set when run reaches a terminal state.
report_path TEXT Path to report.json on disk.
hold_ip TEXT Agent IP during FailedHolding (for SSH command).
override_flags_json TEXT JSON blob (e.g. {"wipe": true}).
non_destructive INTEGER NOT NULL 0 1 = skip badblocks + wipe probe.
profile TEXT NOT NULL 'quick' quick, deep, or soak.

Indices:

  • idx_runs_host on (host_id)
  • idx_runs_state on (state)

stages

Per-stage results within a run. Seeded at /claim time with one row per stage in DefaultStageOrder.

Column Type Constraints Default Description
id INTEGER PK AUTOINCREMENT
run_id INTEGER NOT NULL FK → runs(id) CASCADE
name TEXT NOT NULL Stage name (e.g. SMART, CPUStress).
ordinal INTEGER NOT NULL 0-based position in the pipeline.
state TEXT NOT NULL pending, running, passed, failed, skipped.
started_at TIMESTAMP Set when the stage begins.
completed_at TIMESTAMP Set when the stage finishes.
summary_json TEXT Arbitrary JSON from the agent's result.

Indices:

  • idx_stages_run_ordinal on (run_id, ordinal)

sub_steps

Finer-grained units within a stage (per-disk SMART, per-NIC iperf, CPU/memory pass, per-GPU run). Not every stage has sub-steps.

Column Type Constraints Default Description
id INTEGER PK AUTOINCREMENT
run_id INTEGER NOT NULL FK → runs(id) CASCADE
stage_name TEXT NOT NULL Parent stage name.
ordinal INTEGER NOT NULL 0-based within (run_id, stage_name).
name TEXT NOT NULL Human label (e.g. sda SMART, eth0 iperf).
state TEXT NOT NULL 'pending' pending, running, passed, failed, skipped.
started_at TIMESTAMP
completed_at TIMESTAMP
summary_json TEXT NOT NULL '{}'

Constraints: UNIQUE (run_id, stage_name, ordinal) Indices: idx_sub_steps_run on (run_id, stage_name, ordinal)

measurements

Time-series sensor data from the thermal sidecar and stage executors.

Column Type Constraints Default Description
id INTEGER PK AUTOINCREMENT
run_id INTEGER NOT NULL FK → runs(id) CASCADE
stage_id INTEGER FK → stages(id) SET NULL Optional link to a specific stage.
ts TIMESTAMP NOT NULL Sample timestamp.
kind TEXT NOT NULL temp, power, iperf, fio, smart_attr, psu_volt, fan, etc.
key TEXT NOT NULL Source identifier (e.g. cpu/0, +12V).
value REAL Numeric sample.
unit TEXT Display unit.

Indices: idx_measurements_run_kind_ts on (run_id, kind, ts)

artifacts

On-disk file references (reports, fio logs, iperf logs, hold keys).

Column Type Constraints Default Description
id INTEGER PK AUTOINCREMENT
run_id INTEGER NOT NULL FK → runs(id) CASCADE
stage_id INTEGER FK → stages(id) SET NULL
kind TEXT NOT NULL inventory, report, report_html, hold_key, fio, iperf.
path TEXT NOT NULL Absolute path on disk.
sha256 TEXT NOT NULL SHA-256 hex digest.
size_bytes INTEGER NOT NULL File size.

spec_diffs

Expected-vs-actual hardware divergences from SpecValidate.

Column Type Constraints Default Description
id INTEGER PK AUTOINCREMENT
run_id INTEGER NOT NULL FK → runs(id) CASCADE
field TEXT NOT NULL Dotted path (e.g. memory.total_gib, cpu.logical_cores).
expected TEXT Expected value from the host's spec YAML.
actual TEXT Observed value from the inventory probe.
severity TEXT NOT NULL critical, warning, info.
ignored INTEGER NOT NULL 0 1 = operator chose to ignore this diff.

thresholds

Per-run threshold rules, seeded from the ProfileRegistry + per-host overrides at run creation. Immutable for the run's lifetime.

Column Type Constraints Default Description
id INTEGER PK AUTOINCREMENT
run_id INTEGER NOT NULL FK → runs(id) CASCADE
stage_name TEXT NOT NULL * matches any stage.
kind TEXT NOT NULL Measurement kind to match.
key TEXT NOT NULL Key selector (glob-ish).
op TEXT NOT NULL lt, lte, gt, gte, within_pct.
threshold REAL NOT NULL Limit value.
nominal REAL NOT NULL 0 Reference for within_pct.
unit TEXT NOT NULL '' Display unit.
severity TEXT NOT NULL critical or warning.
source TEXT NOT NULL profile or host_override.

Indices:

  • idx_thresholds_run on (run_id)
  • idx_thresholds_kind on (run_id, stage_name, kind)

threshold_evaluations

Per-sample pass/fail results from threshold evaluation. Drives report badges and pipeline verdict rendering.

Column Type Constraints Default Description
id INTEGER PK AUTOINCREMENT
run_id INTEGER NOT NULL FK → runs(id) CASCADE
threshold_id INTEGER NOT NULL FK → thresholds(id) CASCADE
stage_name TEXT NOT NULL Stage the sample belongs to.
kind TEXT NOT NULL Measurement kind.
key TEXT NOT NULL Source key.
ts TIMESTAMP NOT NULL Sample timestamp.
observed REAL NOT NULL Observed value.
passed INTEGER NOT NULL 1 = within threshold, 0 = breach.

Indices: idx_threshold_evals_run on (run_id, passed)

firmware_snapshots

Per-run firmware version captures (BIOS, BMC, NIC, HBA, microcode, NVMe). Populated by the Firmware stage; consumed by SpecValidate for firmware version diffing.

Column Type Constraints Default Description
id INTEGER PK AUTOINCREMENT
run_id INTEGER NOT NULL FK → runs(id) CASCADE
component TEXT NOT NULL bios, bmc, nic, hba, microcode, nvme_fw.
identifier TEXT NOT NULL Slot, serial, or device path distinguishing this component.
version TEXT NOT NULL Firmware version string.
vendor TEXT NOT NULL ''
raw_json TEXT NOT NULL '{}' Additional metadata.

Indices: idx_firmware_run on (run_id, component)

events

Event log table. Reserved for future use.

Column Type Constraints Default Description
id INTEGER PK AUTOINCREMENT
run_id INTEGER FK → runs(id) CASCADE
host_id INTEGER FK → hosts(id) CASCADE
ts TIMESTAMP NOT NULL
level TEXT NOT NULL
kind TEXT NOT NULL
message TEXT NOT NULL
data_json TEXT

settings

Key-value store for orchestrator-level settings.

Column Type Constraints Description
key TEXT PK
value TEXT NOT NULL

Entity relationships

hosts 1───N runs
              ├──N stages
              │     └──(FK) measurements (stage_id, SET NULL)
              │     └──(FK) artifacts (stage_id, SET NULL)
              ├──N sub_steps
              ├──N measurements (run_id)
              ├──N artifacts (run_id)
              ├──N spec_diffs
              ├──N thresholds
              │     └──N threshold_evaluations
              └──N firmware_snapshots

All foreign keys use ON DELETE CASCADE (except stage_id references which use SET NULL). Deleting a host cascades through its runs and all dependent rows.

Data retention

The janitor goroutine prunes on-disk files (artifacts, logs) based on artifacts.retention_days and logs.retention_days. Database rows are never deleted by the janitor — run histories, measurement time-series, spec diffs, and threshold evaluations survive cleanups indefinitely.

See architecture.md § Data retention and configuration.md § janitor.

Migration history

File What it adds
0001_init.sql Core schema: hosts, runs, stages, measurements, artifacts, spec_diffs, events, settings.
0002_add_hosts_last_seen_at.sql hosts.last_seen_at column for host-mode agent heartbeats.
0003_add_runs_non_destructive.sql runs.non_destructive boolean flag.
0004_add_sub_steps.sql sub_steps table for per-disk/per-NIC granular stage detail.
0005_profiles_thresholds_firmware.sql runs.profile column, thresholds + threshold_evaluations tables, firmware_snapshots table.

All migrations are additive — no schema deletions or renames.