deep profile + threshold gating + firmware stage + Burn super-stage
Ships all five phases of the deep-profile overhaul together. Runs now carry a profile (quick/deep/soak); every profile walks the same 11-stage order — Inventory → Firmware → SpecValidate → SMART → CPUStress → Storage → Network → Burn → GPU → PSU → Reporting — with only per-stage durations and concurrency scaled. Phase 1: profiles.ProfileRegistry loaded from vetting.yaml; runs.profile column + CreateWithProfile; threshold table + evaluator seeded per-run from the shared vetting.thresholds block; breach flips result at /sensor + /result. Phase 2: upgraded CPUStress (stress-ng --cpu-method=all --verify + EDAC/MCE poll), Storage (fio --verify=md5 + SMART start/end delta), Network (sustained iperf + /proc/net/dev deltas) with per-profile knobs from Deps. Phase 3: Burn super-stage with goroutine fan-out for CPU + memory + fio + iperf, PSU rails sampled across the Burn window, SensorMux (2 s flush, 500-sample cap) to absorb backpressure. Phase 4: Firmware stage + firmware_snapshots table; probes dmidecode (BIOS), ipmitool (BMC), ethtool -i (NIC), nvme (sysfs + id-ctrl), lspci (HBA), /proc/cpuinfo (microcode). spec.DiffFirmware folds into SpecValidate with pin-by-identifier and fan-out-across-component matching; mismatches park the run in FailedHolding. Phase 5: profile radio on the host start form, profile chip on the run header, Firmware section in the HTML report, coverage artifact uploaded from CI, agent/tests/fakes/ scaffold with Deps.LookPath seam + stress_ng and dmidecode example fakes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -102,6 +102,21 @@ templ HostActions(d HostPageData) {
|
||||
<div class="host-actions-row">
|
||||
if hostCanStart(d) {
|
||||
<form method="post" action={ templ.SafeURL(fmt.Sprintf("/hosts/%d/start", d.Host.ID)) } class="inline host-start-form">
|
||||
<fieldset class="host-profile-picker">
|
||||
<legend>Profile</legend>
|
||||
<label title="~10 min — post-repair sanity: all probes + gates, short budgets">
|
||||
<input type="radio" name="profile" value="quick" checked/>
|
||||
quick
|
||||
</label>
|
||||
<label title="~8–12 h — overnight soak: long CPU/RAM, full-disk fio verify, 30 min network">
|
||||
<input type="radio" name="profile" value="deep"/>
|
||||
deep
|
||||
</label>
|
||||
<label title="≥24 h — week-long burn-in; opt-in when you suspect intermittent faults">
|
||||
<input type="radio" name="profile" value="soak"/>
|
||||
soak
|
||||
</label>
|
||||
</fieldset>
|
||||
<label class="host-nd-toggle">
|
||||
<input type="checkbox" name="non_destructive" value="1"/>
|
||||
Non-destructive (skip wipe-probe + disk writes)
|
||||
@@ -258,6 +273,16 @@ func hostCanStartIfOnline(d HostPageData) bool {
|
||||
return d.ActiveRun == nil
|
||||
}
|
||||
|
||||
// profileChipValue normalizes a Run.Profile string for display on the
|
||||
// run page chip. Older runs with an empty column predate Phase 1 — show
|
||||
// them as "quick" (the prior implicit default).
|
||||
func profileChipValue(p string) string {
|
||||
if p == "" {
|
||||
return "quick"
|
||||
}
|
||||
return p
|
||||
}
|
||||
|
||||
// runDuration formats the elapsed time for a run using the same buckets
|
||||
// as stageDuration. In-flight runs clock from StartedAt to now so the
|
||||
// run-page header + runs-table row keep ticking on each SSE push.
|
||||
|
||||
Reference in New Issue
Block a user