josh 5e9ad7f569
CI / Lint + build + test (push) Successful in 1m25s
Release / release (push) Successful in 5m38s
probes: sanitize disk serials and normalize GPU model for stable spec keys
Two related bugs were producing different map keys for identical
hardware depending on whether the inventory probe ran in the reporter
on the Proxmox host or in the live-image agent after PXE boot.

1. diskSerial read /sys/block/<dev>/device/{serial,vpd_pg80} and only
   TrimSpace'd the result. vpd_pg80 is a binary SCSI VPD page with a
   4-byte header, and some SSDs leak NUL/control bytes into the text
   serial file. Those bytes survive into the Go string, lowercase
   unchanged, and become a garbage map key that the reporter's cleaner
   read can't match. Sanitize to ASCII-printable range at ingest.

2. probeGPUs built the model slug from fields[2] + " " + fields[3] of
   `lspci -mm -nnk` output. fields[3] is subsystem vendor/device info,
   which varies between otherwise-identical cards and carries the
   `-rXX` revision marker — stable-enough for display but not for
   identity. Use fields[2] alone, strip the trailing `[NNNN]` PCI
   device-ID that lspci -nn appends, and sanitize for consistency.

After deploying the new orchestrator + re-running the configure step
on each registered host, SpecValidate will match cleanly. Disk diffs
self-resolve because the reporter already stored clean serials; GPU
diffs need one reporter re-run because the old expected slug still
carries subsystem noise.
2026-04-18 16:06:18 -04:00

Vetting

Post-repair hardware validation pipeline for Proxmox cluster hosts. Register a host, click Start Vetting, and the orchestrator will PXE-boot it into a custom Linux live image and run it through a consistent battery of tests (CPU stress, RAM stress, SMART, disk I/O, network throughput, GPU, PSU telemetry). Pass → auto-shutdown + HTML report. Fail → pipeline halts, SSH drops in, notification fires.

Built for solo-operator home labs: one Go binary, SQLite + flat files, HTMX + SSE UI, bundled dnsmasq, optional ntfy / Discord / SMTP notifications.

Documentation

Quick start (local, against QEMU)

make all
./bin/vetting --config deploy/vetting.example.yaml
# → http://localhost:8080

The UI has no built-in auth — bind to loopback or LAN only, or front the service with a reverse proxy (Caddy/nginx basic-auth) if you want a password. The agent↔orchestrator channel keeps its own bearer-token auth and is unaffected.

For a full end-to-end QEMU walk-through (bridge setup, host registration, PXE boot), see docs/operations.md § First vetting run.

Production install (Proxmox LXC)

On a fresh Debian/Ubuntu LXC, as root:

curl -fsSL https://gitea.thewrightserver.net/josh/Vetting/raw/branch/main/deploy/proxmox-install.sh | bash

That installs Go (if missing), clones the repo to /opt/vetting-src, builds vetting-linux-amd64, and hands off to deploy/install.sh — which lays down the binary, systemd unit, example config, and vetting service user. Then:

# Edit /etc/vetting/vetting.yaml (server.bind + server.public_url)
sudo systemctl enable --now vetting
journalctl -fu vetting

Prefer to build yourself? The manual path:

make orchestrator-linux
scp -r bin deploy lxc:/opt/vetting/
ssh lxc "cd /opt/vetting && sudo ./deploy/install.sh"
ssh lxc "sudo systemctl enable --now vetting"

See docs/operations.md § Install for the full walkthrough.

Repository layout

cmd/                  orchestrator + agent entrypoints
internal/             core packages (see docs/architecture.md for the map)
agent/                in-image agent logic (claim loop, stage dispatch, probes)
live-image/           mkosi config for the PXE-bootable Debian live image
deploy/               systemd unit + install.sh + example config
docs/                 operator + developer docs
test/e2e/             build-tag-gated QEMU + PXE full-stack test
tools/                small CLI helpers

Development

  • make test — Go unit + smoke tests (cross-platform)
  • make vetgo vet on the whole module
  • make live-image — Linux-only; run under WSL from Windows
  • make e2e — requires Linux root + live image + running orchestrator
  • make run — build + launch the orchestrator with the example config

Windows hosts: everything except live-image and e2e works natively. The live image build calls mkosi which needs a real Linux userspace, so use WSL for those targets.

Status

All six phases in the original plan are implemented. The E2E QEMU harness is wired in test/e2e/qemu_test.go but requires a running orchestrator + registered host + queued run as preconditions — it's a developer-facing integration harness, not a unit test.

S
Description
Hardware validation pipeline
Readme 976 KiB
Languages
Go 81.1%
Shell 6.7%
templ 5.5%
CSS 3.8%
Go Template 1%
Other 1.9%