deep profile + threshold gating + firmware stage + Burn super-stage
Ships all five phases of the deep-profile overhaul together. Runs now carry a profile (quick/deep/soak); every profile walks the same 11-stage order — Inventory → Firmware → SpecValidate → SMART → CPUStress → Storage → Network → Burn → GPU → PSU → Reporting — with only per-stage durations and concurrency scaled. Phase 1: profiles.ProfileRegistry loaded from vetting.yaml; runs.profile column + CreateWithProfile; threshold table + evaluator seeded per-run from the shared vetting.thresholds block; breach flips result at /sensor + /result. Phase 2: upgraded CPUStress (stress-ng --cpu-method=all --verify + EDAC/MCE poll), Storage (fio --verify=md5 + SMART start/end delta), Network (sustained iperf + /proc/net/dev deltas) with per-profile knobs from Deps. Phase 3: Burn super-stage with goroutine fan-out for CPU + memory + fio + iperf, PSU rails sampled across the Burn window, SensorMux (2 s flush, 500-sample cap) to absorb backpressure. Phase 4: Firmware stage + firmware_snapshots table; probes dmidecode (BIOS), ipmitool (BMC), ethtool -i (NIC), nvme (sysfs + id-ctrl), lspci (HBA), /proc/cpuinfo (microcode). spec.DiffFirmware folds into SpecValidate with pin-by-identifier and fan-out-across-component matching; mismatches park the run in FailedHolding. Phase 5: profile radio on the host start form, profile chip on the run header, Firmware section in the HTML report, coverage artifact uploaded from CI, agent/tests/fakes/ scaffold with Deps.LookPath seam + stress_ng and dmidecode example fakes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,57 @@
|
||||
-- Phase-1 groundwork for profile-aware, threshold-gated vetting.
|
||||
--
|
||||
-- Adds:
|
||||
-- * runs.profile — which profile the run is executing
|
||||
-- (quick|deep|soak; defaults to quick for
|
||||
-- backfill of older rows + tests).
|
||||
-- * thresholds — seeded per run at creation from the
|
||||
-- ProfileRegistry + per-host overrides;
|
||||
-- immutable for that run so a late config
|
||||
-- edit can't retroactively pass/fail it.
|
||||
-- * threshold_evaluations — one row per observed sample vs threshold;
|
||||
-- drives the report + pipeline badges.
|
||||
-- * firmware_snapshots — per-run BIOS/BMC/NIC/HBA/microcode/NVMe
|
||||
-- version captures used by SpecValidate
|
||||
-- diffing in Phase 4.
|
||||
|
||||
ALTER TABLE runs ADD COLUMN profile TEXT NOT NULL DEFAULT 'quick';
|
||||
|
||||
CREATE TABLE IF NOT EXISTS thresholds (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
run_id INTEGER NOT NULL REFERENCES runs(id) ON DELETE CASCADE,
|
||||
stage_name TEXT NOT NULL, -- "*" matches any stage
|
||||
kind TEXT NOT NULL, -- temp|psu_volt|iperf|fio_p99_us|nic_retrans|edac_ce|edac_ue|mce|...
|
||||
key TEXT NOT NULL, -- "*" or glob-ish match (prefix* / *suffix / exact)
|
||||
op TEXT NOT NULL, -- lt|lte|gt|gte|within_pct
|
||||
threshold REAL NOT NULL,
|
||||
nominal REAL NOT NULL DEFAULT 0, -- used by within_pct; 0 elsewhere
|
||||
unit TEXT NOT NULL DEFAULT '',
|
||||
severity TEXT NOT NULL, -- critical|warning
|
||||
source TEXT NOT NULL -- profile|host_override
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS idx_thresholds_run ON thresholds(run_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_thresholds_kind ON thresholds(run_id, stage_name, kind);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS threshold_evaluations (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
run_id INTEGER NOT NULL REFERENCES runs(id) ON DELETE CASCADE,
|
||||
threshold_id INTEGER NOT NULL REFERENCES thresholds(id) ON DELETE CASCADE,
|
||||
stage_name TEXT NOT NULL,
|
||||
kind TEXT NOT NULL,
|
||||
key TEXT NOT NULL,
|
||||
ts TIMESTAMP NOT NULL,
|
||||
observed REAL NOT NULL,
|
||||
passed INTEGER NOT NULL -- 1 = sample within threshold, 0 = breach
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS idx_threshold_evals_run ON threshold_evaluations(run_id, passed);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS firmware_snapshots (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
run_id INTEGER NOT NULL REFERENCES runs(id) ON DELETE CASCADE,
|
||||
component TEXT NOT NULL, -- bios|bmc|nic|hba|microcode|nvme_fw
|
||||
identifier TEXT NOT NULL, -- slot/serial/device path that distinguishes this component
|
||||
version TEXT NOT NULL,
|
||||
vendor TEXT NOT NULL DEFAULT '',
|
||||
raw_json TEXT NOT NULL DEFAULT '{}'
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS idx_firmware_run ON firmware_snapshots(run_id, component);
|
||||
Reference in New Issue
Block a user