7 Commits

Author SHA1 Message Date
josh 481b67fb69 feat(firmware): install probe tools in live image + surface nic/hba gaps
CI / Lint + build + test (push) Successful in 1m42s
Release / release (push) Successful in 11m25s
mkosi.conf: add ipmitool, ethtool, nvme-cli so the Firmware stage
can actually read BMC revisions, NIC firmware versions, and fall
back to nvme-cli when sysfs firmware_rev is missing.

firmware.go: probeNICFirmware and probeHBAFirmware now return
(snapshots, warning) so a missing ethtool/lspci surfaces in the
stage log the same way probeBIOS/probeBMC already do. Before, a
host without ethtool silently reported "bios=1 nvme_fw=1
microcode=1" with no hint that nic coverage was dropped.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-19 21:56:18 -04:00
josh e73e31af92 live-image: install stage tools and fail loudly if any are missing
CI / Lint + build + test (push) Successful in 1m32s
Release / release (push) Successful in 6m28s
The live image was still carrying the Phase 2 package list, so SMART,
CPUStress, and Network each hit a LookPath miss and returned
pass-with-skip. A run that skipped every real check still ended in
"completed" — nothing on the report said the image was broken.

Add smartmontools, stress-ng, fio, iperf3, lshw, lm-sensors,
e2fsprogs, and util-linux to mkosi.conf. Flip the three stages from
skip-pass to fail when their binary is missing so any future
packaging regression blocks the run instead of whispering past it.
Legitimate "no hardware" skips (no GPU, no hwmon, no disks,
non-destructive) are untouched.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 16:39:28 -04:00
josh 6c6d20710f live-image: fix check-initrd size measurement; add zstd to image
CI / Lint + build + test (push) Successful in 1m28s
Release / release (push) Failing after 4m10s
Previous run actually built the 518 MB rootfs with firmware-misc-nonfree
et al. installed — the real payload is working. Two follow-ups:

- check-initrd was reading stat on a symlink path and getting 30 bytes
  (the symlink's own size), not the 6.1.0-44-amd64 kernel initrd it
  points to. Switched to wc -c, which follows symlinks, and to du -hL
  for the OK message.
- Add zstd to Packages= so COMPRESS=zstd in initramfs.conf can be
  honored; without it update-initramfs falls back to gzip with a
  "No zstd in PATH" warning.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 14:00:07 -04:00
josh 28918bad15 live-image: fix firmware so i915 actually loads at boot
CI / Lint + build + test (push) Successful in 1m35s
Release / release (push) Failing after 22s
Previous attempt (c962d6d) added firmware-linux-nonfree to mkosi.conf,
but the CI bundle was still 63 MB and Tiger Lake wedged on tgl_guc.
Two reasons: (1) firmware-linux-nonfree on bookworm is a thin
metapackage that doesn't include firmware-misc-nonfree, which is where
i915 GuC/HuC blobs actually live; (2) Ubuntu's apt-packaged mkosi is
old enough that Repositories=non-free-firmware shorthand likely isn't
wired through to the debootstrap invocation, so firmware packages
silently miss the bootstrap step entirely.

Changes:
- Enumerate firmware packages explicitly in mkosi.conf (firmware-
  misc-nonfree, firmware-iwlwifi, firmware-realtek, firmware-amd-
  graphics, firmware-intel-sound, intel/amd64-microcode).
- Ship mkosi.sources.d/debian.sources with explicit deb822 so the
  non-free-firmware component is unambiguously available.
- Install mkosi 24.3 via pip in CI instead of apt's older build.
- Pin MODULES=most and COMPRESS=zstd via a tracked initramfs-tools
  config under mkosi.extra/.
- Narrow .gitignore so only the generated agent binary is ignored,
  not the whole mkosi.extra/ tree.
- New check-initrd Makefile target asserts both size (>=150 MB) and
  actual presence of i915/tgl_guc_*.bin inside the built initrd, so
  a silent firmware-drop regression fails the build loudly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 13:38:40 -04:00
josh c962d6d8ab live-image: bundle nonfree firmware (i915 GuC et al.)
CI / Lint + build + test (push) Successful in 2m19s
Release / release (push) Successful in 3m28s
Tiger Lake and later Intel iGPUs need i915/tgl_guc_*.bin; without
it the i915 init wedges and floods the console. Same story on most
modern wifi/NIC hardware. Pull firmware-linux-nonfree (metapackage
covering misc-nonfree, iwlwifi, realtek, amd-graphics, …) from the
bookworm non-free-firmware repo — single line fix, ~500MB cost to
the squashfs, worth it for booting arbitrary repaired hosts.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 13:14:19 -04:00
josh 5aa245cd85 live-image: disable mkosi Bootable (PXE doesn't need a bootloader)
CI / Lint + build + test (push) Successful in 1m36s
Release / release (push) Successful in 1m56s
mkosi was failing with "systemd-boot was not found at
usr/lib/systemd/boot/efi" because Bootable=yes expects systemd-boot
installed *inside* the image for EFI boot. This image is only ever
PXE-booted — iPXE loads vmlinuz+initrd from TFTP directly, so the
rootfs itself needs no bootloader.

Switching to Bootable=no drops the EFI-image assembly step; the
linux-image-amd64 postinst still creates /vmlinuz and /initrd.img
symlinks that the top-level Makefile copies into the bundle.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 10:18:49 -04:00
josh 9bb4b09a04 Initial commit: full Phases 1-6 implementation
CI / Lint + build + test (push) Has been cancelled
Post-repair hardware validation pipeline for Proxmox cluster hosts.
Go orchestrator + in-image agent + mkosi live image + bundled dnsmasq
PXE + SQLite + HTMX/SSE UI + notify registry + janitor + full docs.
2026-04-17 21:32:10 -04:00