deep profile + threshold gating + firmware stage + Burn super-stage
Ships all five phases of the deep-profile overhaul together. Runs now carry a profile (quick/deep/soak); every profile walks the same 11-stage order — Inventory → Firmware → SpecValidate → SMART → CPUStress → Storage → Network → Burn → GPU → PSU → Reporting — with only per-stage durations and concurrency scaled. Phase 1: profiles.ProfileRegistry loaded from vetting.yaml; runs.profile column + CreateWithProfile; threshold table + evaluator seeded per-run from the shared vetting.thresholds block; breach flips result at /sensor + /result. Phase 2: upgraded CPUStress (stress-ng --cpu-method=all --verify + EDAC/MCE poll), Storage (fio --verify=md5 + SMART start/end delta), Network (sustained iperf + /proc/net/dev deltas) with per-profile knobs from Deps. Phase 3: Burn super-stage with goroutine fan-out for CPU + memory + fio + iperf, PSU rails sampled across the Burn window, SensorMux (2 s flush, 500-sample cap) to absorb backpressure. Phase 4: Firmware stage + firmware_snapshots table; probes dmidecode (BIOS), ipmitool (BMC), ethtool -i (NIC), nvme (sysfs + id-ctrl), lspci (HBA), /proc/cpuinfo (microcode). spec.DiffFirmware folds into SpecValidate with pin-by-identifier and fan-out-across-component matching; mismatches park the run in FailedHolding. Phase 5: profile radio on the host start form, profile chip on the run header, Firmware section in the HTML report, coverage artifact uploaded from CI, agent/tests/fakes/ scaffold with Deps.LookPath seam + stress_ng and dmidecode example fakes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,85 @@
|
||||
package probes
|
||||
|
||||
import (
|
||||
"bufio"
|
||||
"io"
|
||||
"os"
|
||||
"strconv"
|
||||
"strings"
|
||||
)
|
||||
|
||||
// NetDevSnapshot is the per-interface counter row from /proc/net/dev at
|
||||
// a single instant. Used by the Network stage to compute deltas across
|
||||
// an iperf window — a rising rx_errors or tx_dropped during a loaded
|
||||
// link is a real NIC problem, not general noise.
|
||||
type NetDevSnapshot struct {
|
||||
Iface string
|
||||
RxBytes uint64
|
||||
RxErrs uint64
|
||||
RxDrop uint64
|
||||
TxBytes uint64
|
||||
TxErrs uint64
|
||||
TxDrop uint64
|
||||
}
|
||||
|
||||
// NetDev reads /proc/net/dev and returns one snapshot per non-loopback
|
||||
// interface. Returns nil on read/parse failure (best-effort: a missing
|
||||
// /proc is survivable; the caller skips delta reporting that tick).
|
||||
func NetDev() []NetDevSnapshot {
|
||||
f, err := os.Open("/proc/net/dev")
|
||||
if err != nil {
|
||||
return nil
|
||||
}
|
||||
defer func() { _ = f.Close() }()
|
||||
return parseNetDev(f)
|
||||
}
|
||||
|
||||
// parseNetDev is split from NetDev so tests can feed a fixture without
|
||||
// touching the real /proc. The /proc/net/dev format is two header lines
|
||||
// followed by rows of "iface: rx_bytes rx_packets rx_errs rx_drop ... tx_bytes tx_packets tx_errs tx_drop ..."
|
||||
// — 16 whitespace-separated counters, of which we pull a curated six.
|
||||
func parseNetDev(r io.Reader) []NetDevSnapshot {
|
||||
var out []NetDevSnapshot
|
||||
sc := bufio.NewScanner(r)
|
||||
// Skip the two header lines (iface || bytes ... || bytes ...).
|
||||
for i := 0; i < 2 && sc.Scan(); i++ {
|
||||
}
|
||||
for sc.Scan() {
|
||||
line := strings.TrimSpace(sc.Text())
|
||||
if line == "" {
|
||||
continue
|
||||
}
|
||||
colon := strings.IndexByte(line, ':')
|
||||
if colon < 0 {
|
||||
continue
|
||||
}
|
||||
iface := strings.TrimSpace(line[:colon])
|
||||
if iface == "" || iface == "lo" {
|
||||
continue
|
||||
}
|
||||
fields := strings.Fields(line[colon+1:])
|
||||
if len(fields) < 16 {
|
||||
continue
|
||||
}
|
||||
// /proc/net/dev columns:
|
||||
// 0 rx_bytes 1 rx_packets 2 rx_errs 3 rx_drop 4 fifo 5 frame 6 compressed 7 multicast
|
||||
// 8 tx_bytes 9 tx_packets 10 tx_errs 11 tx_drop 12 fifo 13 colls 14 carrier 15 compressed
|
||||
snap := NetDevSnapshot{Iface: iface}
|
||||
snap.RxBytes = parseU64(fields[0])
|
||||
snap.RxErrs = parseU64(fields[2])
|
||||
snap.RxDrop = parseU64(fields[3])
|
||||
snap.TxBytes = parseU64(fields[8])
|
||||
snap.TxErrs = parseU64(fields[10])
|
||||
snap.TxDrop = parseU64(fields[11])
|
||||
out = append(out, snap)
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
func parseU64(s string) uint64 {
|
||||
n, err := strconv.ParseUint(s, 10, 64)
|
||||
if err != nil {
|
||||
return 0
|
||||
}
|
||||
return n
|
||||
}
|
||||
Reference in New Issue
Block a user