Commit Graph

9 Commits

Author SHA1 Message Date
josh 28918bad15 live-image: fix firmware so i915 actually loads at boot
CI / Lint + build + test (push) Successful in 1m35s
Release / release (push) Failing after 22s
Previous attempt (c962d6d) added firmware-linux-nonfree to mkosi.conf,
but the CI bundle was still 63 MB and Tiger Lake wedged on tgl_guc.
Two reasons: (1) firmware-linux-nonfree on bookworm is a thin
metapackage that doesn't include firmware-misc-nonfree, which is where
i915 GuC/HuC blobs actually live; (2) Ubuntu's apt-packaged mkosi is
old enough that Repositories=non-free-firmware shorthand likely isn't
wired through to the debootstrap invocation, so firmware packages
silently miss the bootstrap step entirely.

Changes:
- Enumerate firmware packages explicitly in mkosi.conf (firmware-
  misc-nonfree, firmware-iwlwifi, firmware-realtek, firmware-amd-
  graphics, firmware-intel-sound, intel/amd64-microcode).
- Ship mkosi.sources.d/debian.sources with explicit deb822 so the
  non-free-firmware component is unambiguously available.
- Install mkosi 24.3 via pip in CI instead of apt's older build.
- Pin MODULES=most and COMPRESS=zstd via a tracked initramfs-tools
  config under mkosi.extra/.
- Narrow .gitignore so only the generated agent binary is ignored,
  not the whole mkosi.extra/ tree.
- New check-initrd Makefile target asserts both size (>=150 MB) and
  actual presence of i915/tgl_guc_*.bin inside the built initrd, so
  a silent firmware-drop regression fails the build loudly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 13:38:40 -04:00
josh c962d6d8ab live-image: bundle nonfree firmware (i915 GuC et al.)
CI / Lint + build + test (push) Successful in 2m19s
Release / release (push) Successful in 3m28s
Tiger Lake and later Intel iGPUs need i915/tgl_guc_*.bin; without
it the i915 init wedges and floods the console. Same story on most
modern wifi/NIC hardware. Pull firmware-linux-nonfree (metapackage
covering misc-nonfree, iwlwifi, realtek, amd-graphics, …) from the
bookworm non-free-firmware repo — single line fix, ~500MB cost to
the squashfs, worth it for booting arbitrary repaired hosts.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 13:14:19 -04:00
josh caebd00d8d live-image: symlink /initrd.img to match /vmlinuz
CI / Lint + build + test (push) Successful in 1m47s
Release / release (push) Successful in 2m14s
The linux-image-amd64 postinst creates /vmlinuz but the paired
/initrd.img symlink only shows up via an initramfs-tools hook that
doesn't fire when we call update-initramfs ourselves. Without it,
the top-level Makefile's `cp live-image/build/initrd.img` fails and
`make release` aborts with a broken bundle.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 10:54:25 -04:00
josh 41a273b47f live-image: generate initrd explicitly; fail release on missing files
CI / Lint + build + test (push) Successful in 1m47s
Release / release (push) Failing after 2m28s
Two bugs chained together to ship a broken bundle:

1. With Bootable=no, mkosi skips update-initramfs, so no
   /boot/initrd.img-<kver> ever gets generated inside the rootfs.
   The postinst now runs update-initramfs via chroot to produce it.

2. The `make release` recipe chained its `cp` calls with `;`, so
   a missing live-image/build/initrd.img silently failed and the
   bundle still got tarred + uploaded. Adding `set -e` at the top
   of the recipe makes any missing component fail the build loudly
   instead of shipping a half-bundle.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 10:47:26 -04:00
josh 5aa245cd85 live-image: disable mkosi Bootable (PXE doesn't need a bootloader)
CI / Lint + build + test (push) Successful in 1m36s
Release / release (push) Successful in 1m56s
mkosi was failing with "systemd-boot was not found at
usr/lib/systemd/boot/efi" because Bootable=yes expects systemd-boot
installed *inside* the image for EFI boot. This image is only ever
PXE-booted — iPXE loads vmlinuz+initrd from TFTP directly, so the
rootfs itself needs no bootloader.

Switching to Bootable=no drops the EFI-image assembly step; the
linux-image-amd64 postinst still creates /vmlinuz and /initrd.img
symlinks that the top-level Makefile copies into the bundle.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 10:18:49 -04:00
josh a893b0d817 live-image: stage agent binary via mkosi.extra
CI / Lint + build + test (push) Successful in 1m33s
Release / release (push) Failing after 1m43s
mkosi only mounts live-image/ as /work/src, so the postinst couldn't
reach the repo-root bin/vetting-agent.linux-amd64 — the build failed
in CI with `install: cannot stat '/work/src/bin/vetting-agent.linux-amd64'`.

The Makefile now copies the prebuilt agent into mkosi.extra/, which
mkosi merges into the image root automatically. The postinst is
reduced to creating the multi-user.target.wants symlink.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 03:13:38 -04:00
josh 4dda1dad83 live-image: mark mkosi.postinst executable in git index
CI / Lint + build + test (push) Successful in 1m38s
Release / release (push) Failing after 1m4s
mkosi refuses to run a non-executable postinst. git was tracking it
as 100644 because it was added from Windows (no POSIX exec bit on the
FS), so CI saw a non-executable file even though WSL/Linux had been
treating it fine locally. Same fix applied earlier to install.sh +
pxe-setup.sh.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 02:41:40 -04:00
josh a5055b3c7a Automate PXE setup: release bundle + pxe-setup.sh + startup validation
CI / Lint + build + test (push) Has been cancelled
Collapses the LXC side of PXE enablement from a six-step manual dance
(build, fetch iPXE, scp, bridge, hand-edit yaml) into:

  make release                   # dev box (Linux/WSL)
  scp bundle.tar.gz lxc:/tmp/
  sudo ./install.sh              # base install, unchanged
  sudo ./pxe-setup.sh --interface ... --dhcp-range ... --orchestrator-url ...

pxe-setup.sh fetches iPXE from boot.ipxe.org, verifies against pinned
SHA256s in deploy/ipxe-shas.txt (fail-closed), places vmlinuz/initrd.img
from the bundle, and rewrites only the pxe: block of vetting.yaml.
Idempotent; --force gates overwriting a hand-edited block.

Adds Supervisor.Validate() — called before dnsmasq spawn — so typo'd
configs fail at orchestrator startup with clear errors naming the
missing file or yaml key, instead of silently serving broken TFTP
until a real host tries to PXE-boot. Nine tests cover missing files,
bogus interface, malformed dhcp_range, bad orchestrator_url, and
aggregate reporting.

Hypervisor bridge creation stays documented (LXC can't do it) but
everything downstream of the bridge is now scripted.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-18 01:38:43 -04:00
josh 9bb4b09a04 Initial commit: full Phases 1-6 implementation
CI / Lint + build + test (push) Has been cancelled
Post-repair hardware validation pipeline for Proxmox cluster hosts.
Go orchestrator + in-image agent + mkosi live image + bundled dnsmasq
PXE + SQLite + HTMX/SSE UI + notify registry + janitor + full docs.
2026-04-17 21:32:10 -04:00