Add 4 new doc files (configuration reference, development guide, API reference with full request/response schemas, database schema), expand the README with a feature list and how-it-works walkthrough, fix missing Firmware and Burn stages in architecture.md and test-suite.md, add threshold engine and host-mode agent sections, and add godoc comments to 11 packages and 6 model types. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
12 KiB
Database schema
The orchestrator uses SQLite via
modernc.org/sqlite — a pure
Go driver with no cgo dependency. The database file is created on
first startup at the path in database.path
(default ./var/vetting.db).
Pragmas set at open time:
PRAGMA journal_mode = WAL— write-ahead logging for concurrent readers.PRAGMA foreign_keys = ON— enforced referential integrity.
Migrations are embedded via go:embed in internal/db/ and
applied in filename order at startup. A schema_migrations table
tracks which migrations have run.
Tables
hosts
Registered hardware nodes in the vetting cluster.
| Column | Type | Constraints | Default | Description |
|---|---|---|---|---|
id |
INTEGER | PK AUTOINCREMENT | ||
name |
TEXT | NOT NULL UNIQUE | Human-readable host name. | |
mac |
TEXT | NOT NULL UNIQUE | Lowercase colon form (e.g. aa:bb:cc:dd:ee:ff). |
|
wol_broadcast_ip |
TEXT | NOT NULL | LAN broadcast IP for Wake-on-LAN magic packets. | |
wol_port |
INTEGER | NOT NULL | 9 |
WoL UDP port. |
expected_spec_yaml |
TEXT | NOT NULL | YAML describing expected hardware (CPU, memory, disks, firmware). | |
pdu_config_json |
TEXT | PDU power control config (future use). | ||
ipmi_config_json |
TEXT | IPMI config (future use). | ||
notes |
TEXT | NOT NULL | '' |
Operator notes. |
created_at |
TIMESTAMP | NOT NULL | CURRENT_TIMESTAMP |
|
updated_at |
TIMESTAMP | NOT NULL | CURRENT_TIMESTAMP |
|
last_seen_at |
TIMESTAMP | Host-mode agent heartbeat timestamp. NULL = never seen. |
runs
Vetting run instances. Each run belongs to one host and walks through the state machine.
| Column | Type | Constraints | Default | Description |
|---|---|---|---|---|
id |
INTEGER | PK AUTOINCREMENT | ||
host_id |
INTEGER | NOT NULL FK → hosts(id) CASCADE | ||
state |
TEXT | NOT NULL | Current RunState (see internal/model). |
|
result |
TEXT | pass or fail once terminal. |
||
failed_stage |
TEXT | Stage name that halted the pipeline. | ||
next_boot_target |
TEXT | linux, memtest, etc. (future use). |
||
agent_token_hash |
TEXT | NOT NULL | SHA-256 hash of the bearer token. | |
started_at |
TIMESTAMP | NOT NULL | CURRENT_TIMESTAMP |
|
completed_at |
TIMESTAMP | Set when run reaches a terminal state. | ||
report_path |
TEXT | Path to report.json on disk. |
||
hold_ip |
TEXT | Agent IP during FailedHolding (for SSH command). | ||
override_flags_json |
TEXT | JSON blob (e.g. {"wipe": true}). |
||
non_destructive |
INTEGER | NOT NULL | 0 |
1 = skip badblocks + wipe probe. |
profile |
TEXT | NOT NULL | 'quick' |
quick, deep, or soak. |
Indices:
idx_runs_hoston(host_id)idx_runs_stateon(state)
stages
Per-stage results within a run. Seeded at /claim time with one row
per stage in DefaultStageOrder.
| Column | Type | Constraints | Default | Description |
|---|---|---|---|---|
id |
INTEGER | PK AUTOINCREMENT | ||
run_id |
INTEGER | NOT NULL FK → runs(id) CASCADE | ||
name |
TEXT | NOT NULL | Stage name (e.g. SMART, CPUStress). |
|
ordinal |
INTEGER | NOT NULL | 0-based position in the pipeline. | |
state |
TEXT | NOT NULL | pending, running, passed, failed, skipped. |
|
started_at |
TIMESTAMP | Set when the stage begins. | ||
completed_at |
TIMESTAMP | Set when the stage finishes. | ||
summary_json |
TEXT | Arbitrary JSON from the agent's result. |
Indices:
idx_stages_run_ordinalon(run_id, ordinal)
sub_steps
Finer-grained units within a stage (per-disk SMART, per-NIC iperf, CPU/memory pass, per-GPU run). Not every stage has sub-steps.
| Column | Type | Constraints | Default | Description |
|---|---|---|---|---|
id |
INTEGER | PK AUTOINCREMENT | ||
run_id |
INTEGER | NOT NULL FK → runs(id) CASCADE | ||
stage_name |
TEXT | NOT NULL | Parent stage name. | |
ordinal |
INTEGER | NOT NULL | 0-based within (run_id, stage_name). |
|
name |
TEXT | NOT NULL | Human label (e.g. sda SMART, eth0 iperf). |
|
state |
TEXT | NOT NULL | 'pending' |
pending, running, passed, failed, skipped. |
started_at |
TIMESTAMP | |||
completed_at |
TIMESTAMP | |||
summary_json |
TEXT | NOT NULL | '{}' |
Constraints: UNIQUE (run_id, stage_name, ordinal)
Indices: idx_sub_steps_run on (run_id, stage_name, ordinal)
measurements
Time-series sensor data from the thermal sidecar and stage executors.
| Column | Type | Constraints | Default | Description |
|---|---|---|---|---|
id |
INTEGER | PK AUTOINCREMENT | ||
run_id |
INTEGER | NOT NULL FK → runs(id) CASCADE | ||
stage_id |
INTEGER | FK → stages(id) SET NULL | Optional link to a specific stage. | |
ts |
TIMESTAMP | NOT NULL | Sample timestamp. | |
kind |
TEXT | NOT NULL | temp, power, iperf, fio, smart_attr, psu_volt, fan, etc. |
|
key |
TEXT | NOT NULL | Source identifier (e.g. cpu/0, +12V). |
|
value |
REAL | Numeric sample. | ||
unit |
TEXT | Display unit. |
Indices: idx_measurements_run_kind_ts on (run_id, kind, ts)
artifacts
On-disk file references (reports, fio logs, iperf logs, hold keys).
| Column | Type | Constraints | Default | Description |
|---|---|---|---|---|
id |
INTEGER | PK AUTOINCREMENT | ||
run_id |
INTEGER | NOT NULL FK → runs(id) CASCADE | ||
stage_id |
INTEGER | FK → stages(id) SET NULL | ||
kind |
TEXT | NOT NULL | inventory, report, report_html, hold_key, fio, iperf. |
|
path |
TEXT | NOT NULL | Absolute path on disk. | |
sha256 |
TEXT | NOT NULL | SHA-256 hex digest. | |
size_bytes |
INTEGER | NOT NULL | File size. |
spec_diffs
Expected-vs-actual hardware divergences from SpecValidate.
| Column | Type | Constraints | Default | Description |
|---|---|---|---|---|
id |
INTEGER | PK AUTOINCREMENT | ||
run_id |
INTEGER | NOT NULL FK → runs(id) CASCADE | ||
field |
TEXT | NOT NULL | Dotted path (e.g. memory.total_gib, cpu.logical_cores). |
|
expected |
TEXT | Expected value from the host's spec YAML. | ||
actual |
TEXT | Observed value from the inventory probe. | ||
severity |
TEXT | NOT NULL | critical, warning, info. |
|
ignored |
INTEGER | NOT NULL | 0 |
1 = operator chose to ignore this diff. |
thresholds
Per-run threshold rules, seeded from the ProfileRegistry + per-host
overrides at run creation. Immutable for the run's lifetime.
| Column | Type | Constraints | Default | Description |
|---|---|---|---|---|
id |
INTEGER | PK AUTOINCREMENT | ||
run_id |
INTEGER | NOT NULL FK → runs(id) CASCADE | ||
stage_name |
TEXT | NOT NULL | * matches any stage. |
|
kind |
TEXT | NOT NULL | Measurement kind to match. | |
key |
TEXT | NOT NULL | Key selector (glob-ish). | |
op |
TEXT | NOT NULL | lt, lte, gt, gte, within_pct. |
|
threshold |
REAL | NOT NULL | Limit value. | |
nominal |
REAL | NOT NULL | 0 |
Reference for within_pct. |
unit |
TEXT | NOT NULL | '' |
Display unit. |
severity |
TEXT | NOT NULL | critical or warning. |
|
source |
TEXT | NOT NULL | profile or host_override. |
Indices:
idx_thresholds_runon(run_id)idx_thresholds_kindon(run_id, stage_name, kind)
threshold_evaluations
Per-sample pass/fail results from threshold evaluation. Drives report badges and pipeline verdict rendering.
| Column | Type | Constraints | Default | Description |
|---|---|---|---|---|
id |
INTEGER | PK AUTOINCREMENT | ||
run_id |
INTEGER | NOT NULL FK → runs(id) CASCADE | ||
threshold_id |
INTEGER | NOT NULL FK → thresholds(id) CASCADE | ||
stage_name |
TEXT | NOT NULL | Stage the sample belongs to. | |
kind |
TEXT | NOT NULL | Measurement kind. | |
key |
TEXT | NOT NULL | Source key. | |
ts |
TIMESTAMP | NOT NULL | Sample timestamp. | |
observed |
REAL | NOT NULL | Observed value. | |
passed |
INTEGER | NOT NULL | 1 = within threshold, 0 = breach. |
Indices: idx_threshold_evals_run on (run_id, passed)
firmware_snapshots
Per-run firmware version captures (BIOS, BMC, NIC, HBA, microcode, NVMe). Populated by the Firmware stage; consumed by SpecValidate for firmware version diffing.
| Column | Type | Constraints | Default | Description |
|---|---|---|---|---|
id |
INTEGER | PK AUTOINCREMENT | ||
run_id |
INTEGER | NOT NULL FK → runs(id) CASCADE | ||
component |
TEXT | NOT NULL | bios, bmc, nic, hba, microcode, nvme_fw. |
|
identifier |
TEXT | NOT NULL | Slot, serial, or device path distinguishing this component. | |
version |
TEXT | NOT NULL | Firmware version string. | |
vendor |
TEXT | NOT NULL | '' |
|
raw_json |
TEXT | NOT NULL | '{}' |
Additional metadata. |
Indices: idx_firmware_run on (run_id, component)
events
Event log table. Reserved for future use.
| Column | Type | Constraints | Default | Description |
|---|---|---|---|---|
id |
INTEGER | PK AUTOINCREMENT | ||
run_id |
INTEGER | FK → runs(id) CASCADE | ||
host_id |
INTEGER | FK → hosts(id) CASCADE | ||
ts |
TIMESTAMP | NOT NULL | ||
level |
TEXT | NOT NULL | ||
kind |
TEXT | NOT NULL | ||
message |
TEXT | NOT NULL | ||
data_json |
TEXT |
settings
Key-value store for orchestrator-level settings.
| Column | Type | Constraints | Description |
|---|---|---|---|
key |
TEXT | PK | |
value |
TEXT | NOT NULL |
Entity relationships
hosts 1───N runs
├──N stages
│ └──(FK) measurements (stage_id, SET NULL)
│ └──(FK) artifacts (stage_id, SET NULL)
├──N sub_steps
├──N measurements (run_id)
├──N artifacts (run_id)
├──N spec_diffs
├──N thresholds
│ └──N threshold_evaluations
└──N firmware_snapshots
All foreign keys use ON DELETE CASCADE (except stage_id references
which use SET NULL). Deleting a host cascades through its runs and
all dependent rows.
Data retention
The janitor goroutine prunes on-disk files (artifacts, logs) based
on artifacts.retention_days and logs.retention_days. Database
rows are never deleted by the janitor — run histories, measurement
time-series, spec diffs, and threshold evaluations survive cleanups
indefinitely.
See architecture.md § Data retention and configuration.md § janitor.
Migration history
| File | What it adds |
|---|---|
0001_init.sql |
Core schema: hosts, runs, stages, measurements, artifacts, spec_diffs, events, settings. |
0002_add_hosts_last_seen_at.sql |
hosts.last_seen_at column for host-mode agent heartbeats. |
0003_add_runs_non_destructive.sql |
runs.non_destructive boolean flag. |
0004_add_sub_steps.sql |
sub_steps table for per-disk/per-NIC granular stage detail. |
0005_profiles_thresholds_firmware.sql |
runs.profile column, thresholds + threshold_evaluations tables, firmware_snapshots table. |
All migrations are additive — no schema deletions or renames.