# Database schema The orchestrator uses SQLite via [modernc.org/sqlite](https://pkg.go.dev/modernc.org/sqlite) — a pure Go driver with no cgo dependency. The database file is created on first startup at the path in `database.path` (default `./var/vetting.db`). **Pragmas set at open time:** - `PRAGMA journal_mode = WAL` — write-ahead logging for concurrent readers. - `PRAGMA foreign_keys = ON` — enforced referential integrity. **Migrations** are embedded via `go:embed` in `internal/db/` and applied in filename order at startup. A `schema_migrations` table tracks which migrations have run. --- ## Tables ### `hosts` Registered hardware nodes in the vetting cluster. | Column | Type | Constraints | Default | Description | |--------|------|-------------|---------|-------------| | `id` | INTEGER | PK AUTOINCREMENT | | | | `name` | TEXT | NOT NULL UNIQUE | | Human-readable host name. | | `mac` | TEXT | NOT NULL UNIQUE | | Lowercase colon form (e.g. `aa:bb:cc:dd:ee:ff`). | | `wol_broadcast_ip` | TEXT | NOT NULL | | LAN broadcast IP for Wake-on-LAN magic packets. | | `wol_port` | INTEGER | NOT NULL | `9` | WoL UDP port. | | `expected_spec_yaml` | TEXT | NOT NULL | | YAML describing expected hardware (CPU, memory, disks, firmware). | | `pdu_config_json` | TEXT | | | PDU power control config (future use). | | `ipmi_config_json` | TEXT | | | IPMI config (future use). | | `notes` | TEXT | NOT NULL | `''` | Operator notes. | | `created_at` | TIMESTAMP | NOT NULL | `CURRENT_TIMESTAMP` | | | `updated_at` | TIMESTAMP | NOT NULL | `CURRENT_TIMESTAMP` | | | `last_seen_at` | TIMESTAMP | | | Host-mode agent heartbeat timestamp. NULL = never seen. | ### `runs` Vetting run instances. Each run belongs to one host and walks through the state machine. | Column | Type | Constraints | Default | Description | |--------|------|-------------|---------|-------------| | `id` | INTEGER | PK AUTOINCREMENT | | | | `host_id` | INTEGER | NOT NULL FK → hosts(id) CASCADE | | | | `state` | TEXT | NOT NULL | | Current `RunState` (see `internal/model`). | | `result` | TEXT | | | `pass` or `fail` once terminal. | | `failed_stage` | TEXT | | | Stage name that halted the pipeline. | | `next_boot_target` | TEXT | | | `linux`, `memtest`, etc. (future use). | | `agent_token_hash` | TEXT | NOT NULL | | SHA-256 hash of the bearer token. | | `started_at` | TIMESTAMP | NOT NULL | `CURRENT_TIMESTAMP` | | | `completed_at` | TIMESTAMP | | | Set when run reaches a terminal state. | | `report_path` | TEXT | | | Path to `report.json` on disk. | | `hold_ip` | TEXT | | | Agent IP during FailedHolding (for SSH command). | | `override_flags_json` | TEXT | | | JSON blob (e.g. `{"wipe": true}`). | | `non_destructive` | INTEGER | NOT NULL | `0` | `1` = skip badblocks + wipe probe. | | `profile` | TEXT | NOT NULL | `'quick'` | `quick`, `deep`, or `soak`. | **Indices:** - `idx_runs_host` on `(host_id)` - `idx_runs_state` on `(state)` ### `stages` Per-stage results within a run. Seeded at `/claim` time with one row per stage in `DefaultStageOrder`. | Column | Type | Constraints | Default | Description | |--------|------|-------------|---------|-------------| | `id` | INTEGER | PK AUTOINCREMENT | | | | `run_id` | INTEGER | NOT NULL FK → runs(id) CASCADE | | | | `name` | TEXT | NOT NULL | | Stage name (e.g. `SMART`, `CPUStress`). | | `ordinal` | INTEGER | NOT NULL | | 0-based position in the pipeline. | | `state` | TEXT | NOT NULL | | `pending`, `running`, `passed`, `failed`, `skipped`. | | `started_at` | TIMESTAMP | | | Set when the stage begins. | | `completed_at` | TIMESTAMP | | | Set when the stage finishes. | | `summary_json` | TEXT | | | Arbitrary JSON from the agent's result. | **Indices:** - `idx_stages_run_ordinal` on `(run_id, ordinal)` ### `sub_steps` Finer-grained units within a stage (per-disk SMART, per-NIC iperf, CPU/memory pass, per-GPU run). Not every stage has sub-steps. | Column | Type | Constraints | Default | Description | |--------|------|-------------|---------|-------------| | `id` | INTEGER | PK AUTOINCREMENT | | | | `run_id` | INTEGER | NOT NULL FK → runs(id) CASCADE | | | | `stage_name` | TEXT | NOT NULL | | Parent stage name. | | `ordinal` | INTEGER | NOT NULL | | 0-based within `(run_id, stage_name)`. | | `name` | TEXT | NOT NULL | | Human label (e.g. `sda SMART`, `eth0 iperf`). | | `state` | TEXT | NOT NULL | `'pending'` | `pending`, `running`, `passed`, `failed`, `skipped`. | | `started_at` | TIMESTAMP | | | | | `completed_at` | TIMESTAMP | | | | | `summary_json` | TEXT | NOT NULL | `'{}'` | | **Constraints:** `UNIQUE (run_id, stage_name, ordinal)` **Indices:** `idx_sub_steps_run` on `(run_id, stage_name, ordinal)` ### `measurements` Time-series sensor data from the thermal sidecar and stage executors. | Column | Type | Constraints | Default | Description | |--------|------|-------------|---------|-------------| | `id` | INTEGER | PK AUTOINCREMENT | | | | `run_id` | INTEGER | NOT NULL FK → runs(id) CASCADE | | | | `stage_id` | INTEGER | FK → stages(id) SET NULL | | Optional link to a specific stage. | | `ts` | TIMESTAMP | NOT NULL | | Sample timestamp. | | `kind` | TEXT | NOT NULL | | `temp`, `power`, `iperf`, `fio`, `smart_attr`, `psu_volt`, `fan`, etc. | | `key` | TEXT | NOT NULL | | Source identifier (e.g. `cpu/0`, `+12V`). | | `value` | REAL | | | Numeric sample. | | `unit` | TEXT | | | Display unit. | **Indices:** `idx_measurements_run_kind_ts` on `(run_id, kind, ts)` ### `artifacts` On-disk file references (reports, fio logs, iperf logs, hold keys). | Column | Type | Constraints | Default | Description | |--------|------|-------------|---------|-------------| | `id` | INTEGER | PK AUTOINCREMENT | | | | `run_id` | INTEGER | NOT NULL FK → runs(id) CASCADE | | | | `stage_id` | INTEGER | FK → stages(id) SET NULL | | | | `kind` | TEXT | NOT NULL | | `inventory`, `report`, `report_html`, `hold_key`, `fio`, `iperf`. | | `path` | TEXT | NOT NULL | | Absolute path on disk. | | `sha256` | TEXT | NOT NULL | | SHA-256 hex digest. | | `size_bytes` | INTEGER | NOT NULL | | File size. | ### `spec_diffs` Expected-vs-actual hardware divergences from SpecValidate. | Column | Type | Constraints | Default | Description | |--------|------|-------------|---------|-------------| | `id` | INTEGER | PK AUTOINCREMENT | | | | `run_id` | INTEGER | NOT NULL FK → runs(id) CASCADE | | | | `field` | TEXT | NOT NULL | | Dotted path (e.g. `memory.total_gib`, `cpu.logical_cores`). | | `expected` | TEXT | | | Expected value from the host's spec YAML. | | `actual` | TEXT | | | Observed value from the inventory probe. | | `severity` | TEXT | NOT NULL | | `critical`, `warning`, `info`. | | `ignored` | INTEGER | NOT NULL | `0` | `1` = operator chose to ignore this diff. | ### `thresholds` Per-run threshold rules, seeded from the `ProfileRegistry` + per-host overrides at run creation. Immutable for the run's lifetime. | Column | Type | Constraints | Default | Description | |--------|------|-------------|---------|-------------| | `id` | INTEGER | PK AUTOINCREMENT | | | | `run_id` | INTEGER | NOT NULL FK → runs(id) CASCADE | | | | `stage_name` | TEXT | NOT NULL | | `*` matches any stage. | | `kind` | TEXT | NOT NULL | | Measurement kind to match. | | `key` | TEXT | NOT NULL | | Key selector (glob-ish). | | `op` | TEXT | NOT NULL | | `lt`, `lte`, `gt`, `gte`, `within_pct`. | | `threshold` | REAL | NOT NULL | | Limit value. | | `nominal` | REAL | NOT NULL | `0` | Reference for `within_pct`. | | `unit` | TEXT | NOT NULL | `''` | Display unit. | | `severity` | TEXT | NOT NULL | | `critical` or `warning`. | | `source` | TEXT | NOT NULL | | `profile` or `host_override`. | **Indices:** - `idx_thresholds_run` on `(run_id)` - `idx_thresholds_kind` on `(run_id, stage_name, kind)` ### `threshold_evaluations` Per-sample pass/fail results from threshold evaluation. Drives report badges and pipeline verdict rendering. | Column | Type | Constraints | Default | Description | |--------|------|-------------|---------|-------------| | `id` | INTEGER | PK AUTOINCREMENT | | | | `run_id` | INTEGER | NOT NULL FK → runs(id) CASCADE | | | | `threshold_id` | INTEGER | NOT NULL FK → thresholds(id) CASCADE | | | | `stage_name` | TEXT | NOT NULL | | Stage the sample belongs to. | | `kind` | TEXT | NOT NULL | | Measurement kind. | | `key` | TEXT | NOT NULL | | Source key. | | `ts` | TIMESTAMP | NOT NULL | | Sample timestamp. | | `observed` | REAL | NOT NULL | | Observed value. | | `passed` | INTEGER | NOT NULL | | `1` = within threshold, `0` = breach. | **Indices:** `idx_threshold_evals_run` on `(run_id, passed)` ### `firmware_snapshots` Per-run firmware version captures (BIOS, BMC, NIC, HBA, microcode, NVMe). Populated by the Firmware stage; consumed by SpecValidate for firmware version diffing. | Column | Type | Constraints | Default | Description | |--------|------|-------------|---------|-------------| | `id` | INTEGER | PK AUTOINCREMENT | | | | `run_id` | INTEGER | NOT NULL FK → runs(id) CASCADE | | | | `component` | TEXT | NOT NULL | | `bios`, `bmc`, `nic`, `hba`, `microcode`, `nvme_fw`. | | `identifier` | TEXT | NOT NULL | | Slot, serial, or device path distinguishing this component. | | `version` | TEXT | NOT NULL | | Firmware version string. | | `vendor` | TEXT | NOT NULL | `''` | | | `raw_json` | TEXT | NOT NULL | `'{}'` | Additional metadata. | **Indices:** `idx_firmware_run` on `(run_id, component)` ### `events` Event log table. Reserved for future use. | Column | Type | Constraints | Default | Description | |--------|------|-------------|---------|-------------| | `id` | INTEGER | PK AUTOINCREMENT | | | | `run_id` | INTEGER | FK → runs(id) CASCADE | | | | `host_id` | INTEGER | FK → hosts(id) CASCADE | | | | `ts` | TIMESTAMP | NOT NULL | | | | `level` | TEXT | NOT NULL | | | | `kind` | TEXT | NOT NULL | | | | `message` | TEXT | NOT NULL | | | | `data_json` | TEXT | | | | ### `settings` Key-value store for orchestrator-level settings. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `key` | TEXT | PK | | | `value` | TEXT | NOT NULL | | --- ## Entity relationships ``` hosts 1───N runs ├──N stages │ └──(FK) measurements (stage_id, SET NULL) │ └──(FK) artifacts (stage_id, SET NULL) ├──N sub_steps ├──N measurements (run_id) ├──N artifacts (run_id) ├──N spec_diffs ├──N thresholds │ └──N threshold_evaluations └──N firmware_snapshots ``` All foreign keys use `ON DELETE CASCADE` (except `stage_id` references which use `SET NULL`). Deleting a host cascades through its runs and all dependent rows. ## Data retention The janitor goroutine prunes **on-disk files** (artifacts, logs) based on `artifacts.retention_days` and `logs.retention_days`. **Database rows are never deleted** by the janitor — run histories, measurement time-series, spec diffs, and threshold evaluations survive cleanups indefinitely. See [architecture.md § Data retention](architecture.md#data-retention) and [configuration.md § janitor](configuration.md#janitor). ## Migration history | File | What it adds | |------|-------------| | `0001_init.sql` | Core schema: `hosts`, `runs`, `stages`, `measurements`, `artifacts`, `spec_diffs`, `events`, `settings`. | | `0002_add_hosts_last_seen_at.sql` | `hosts.last_seen_at` column for host-mode agent heartbeats. | | `0003_add_runs_non_destructive.sql` | `runs.non_destructive` boolean flag. | | `0004_add_sub_steps.sql` | `sub_steps` table for per-disk/per-NIC granular stage detail. | | `0005_profiles_thresholds_firmware.sql` | `runs.profile` column, `thresholds` + `threshold_evaluations` tables, `firmware_snapshots` table. | All migrations are additive — no schema deletions or renames.