mkosi.conf: add ipmitool, ethtool, nvme-cli so the Firmware stage
can actually read BMC revisions, NIC firmware versions, and fall
back to nvme-cli when sysfs firmware_rev is missing.
firmware.go: probeNICFirmware and probeHBAFirmware now return
(snapshots, warning) so a missing ethtool/lspci surfaces in the
stage log the same way probeBIOS/probeBMC already do. Before, a
host without ethtool silently reported "bios=1 nvme_fw=1
microcode=1" with no hint that nic coverage was dropped.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The live image was still carrying the Phase 2 package list, so SMART,
CPUStress, and Network each hit a LookPath miss and returned
pass-with-skip. A run that skipped every real check still ended in
"completed" — nothing on the report said the image was broken.
Add smartmontools, stress-ng, fio, iperf3, lshw, lm-sensors,
e2fsprogs, and util-linux to mkosi.conf. Flip the three stages from
skip-pass to fail when their binary is missing so any future
packaging regression blocks the run instead of whispering past it.
Legitimate "no hardware" skips (no GPU, no hwmon, no disks,
non-destructive) are untouched.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Belt-and-braces for the kernel-cmdline systemd.firstboot=off fix.
mkosi ships /etc/machine-id empty, which triggers firstboot's
interactive locale/timezone/root-password prompt on every PXE boot;
with the agent running unattended there's nobody to answer and
sysinit.target blocks indefinitely.
Mask via a /dev/null symlink in /etc/systemd/system so the service
is unstartable regardless of cmdline — rules out the failure mode
where an older orchestrator binary serves an iPXE script without
the off-switch arg.
Host boots past kernel init and then stalls silently. ACPI DSDT error
about TXHC.RHUB.SS01 is benign noise (Tiger Lake firmware bug) — the
actual problem is that nothing between kernel handoff and (maybe)
systemd is visible on the console.
Two changes:
1. Replace the /init → sbin/init symlink with a real shell script
(live-image/mkosi.extra/init) that mounts /proc /sys /dev /dev/pts
/dev/shm /run before execing systemd. Systemd has fallback mount
code for these, but when it fails the failure is silent. Doing it
explicitly in /init keeps failures visible and avoids the fragile
symlink-resolution trick.
2. Drop 'quiet' from the kernel cmdline and add loglevel=7 plus
systemd.log_target=kmsg + journald.forward_to_console=1 so every
early-boot message reaches both tty0 and ttyS0. Will be dialed
back once boot is stable.
Also: .gitattributes pins LF on live-image/, .gitea/, Makefile, and
*.sh so Windows checkouts don't break shell scripts and Makefile
recipes with CRLF. /init also gets chmod 0755 in repack-initrd as a
belt-and-braces against mode loss on non-Linux checkouts.
update-initramfs produces a boot stub (~50 MB) that expects to mount a
separate rootfs over squashfs/disk/NFS. Our PXE channel only ships
vmlinuz+initrd.img, so the stub had nothing to pivot to — kernel
finished hand-off and the system wedged with firmware, modules, and
userspace stranded in the 545 MB rootfs dir we never delivered.
Replace with an everything-in-initramfs build: cpio.zst the full
rootfs (minus /boot) as the initrd, add /init -> sbin/init for the
kernel's runtime entrypoint, materialize the kernel symlink into a
real file. Bump check-initrd floor to 200 MB and switch the firmware
grep from unmkinitramfs (boot-stub-specific) to zstd | cpio -t.
Also add cpio to the CI apt deps.
Previous run actually built the 518 MB rootfs with firmware-misc-nonfree
et al. installed — the real payload is working. Two follow-ups:
- check-initrd was reading stat on a symlink path and getting 30 bytes
(the symlink's own size), not the 6.1.0-44-amd64 kernel initrd it
points to. Switched to wc -c, which follows symlinks, and to du -hL
for the OK message.
- Add zstd to Packages= so COMPRESS=zstd in initramfs.conf can be
honored; without it update-initramfs falls back to gzip with a
"No zstd in PATH" warning.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Previous attempt (c962d6d) added firmware-linux-nonfree to mkosi.conf,
but the CI bundle was still 63 MB and Tiger Lake wedged on tgl_guc.
Two reasons: (1) firmware-linux-nonfree on bookworm is a thin
metapackage that doesn't include firmware-misc-nonfree, which is where
i915 GuC/HuC blobs actually live; (2) Ubuntu's apt-packaged mkosi is
old enough that Repositories=non-free-firmware shorthand likely isn't
wired through to the debootstrap invocation, so firmware packages
silently miss the bootstrap step entirely.
Changes:
- Enumerate firmware packages explicitly in mkosi.conf (firmware-
misc-nonfree, firmware-iwlwifi, firmware-realtek, firmware-amd-
graphics, firmware-intel-sound, intel/amd64-microcode).
- Ship mkosi.sources.d/debian.sources with explicit deb822 so the
non-free-firmware component is unambiguously available.
- Install mkosi 24.3 via pip in CI instead of apt's older build.
- Pin MODULES=most and COMPRESS=zstd via a tracked initramfs-tools
config under mkosi.extra/.
- Narrow .gitignore so only the generated agent binary is ignored,
not the whole mkosi.extra/ tree.
- New check-initrd Makefile target asserts both size (>=150 MB) and
actual presence of i915/tgl_guc_*.bin inside the built initrd, so
a silent firmware-drop regression fails the build loudly.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Tiger Lake and later Intel iGPUs need i915/tgl_guc_*.bin; without
it the i915 init wedges and floods the console. Same story on most
modern wifi/NIC hardware. Pull firmware-linux-nonfree (metapackage
covering misc-nonfree, iwlwifi, realtek, amd-graphics, …) from the
bookworm non-free-firmware repo — single line fix, ~500MB cost to
the squashfs, worth it for booting arbitrary repaired hosts.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The linux-image-amd64 postinst creates /vmlinuz but the paired
/initrd.img symlink only shows up via an initramfs-tools hook that
doesn't fire when we call update-initramfs ourselves. Without it,
the top-level Makefile's `cp live-image/build/initrd.img` fails and
`make release` aborts with a broken bundle.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two bugs chained together to ship a broken bundle:
1. With Bootable=no, mkosi skips update-initramfs, so no
/boot/initrd.img-<kver> ever gets generated inside the rootfs.
The postinst now runs update-initramfs via chroot to produce it.
2. The `make release` recipe chained its `cp` calls with `;`, so
a missing live-image/build/initrd.img silently failed and the
bundle still got tarred + uploaded. Adding `set -e` at the top
of the recipe makes any missing component fail the build loudly
instead of shipping a half-bundle.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
mkosi was failing with "systemd-boot was not found at
usr/lib/systemd/boot/efi" because Bootable=yes expects systemd-boot
installed *inside* the image for EFI boot. This image is only ever
PXE-booted — iPXE loads vmlinuz+initrd from TFTP directly, so the
rootfs itself needs no bootloader.
Switching to Bootable=no drops the EFI-image assembly step; the
linux-image-amd64 postinst still creates /vmlinuz and /initrd.img
symlinks that the top-level Makefile copies into the bundle.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
mkosi only mounts live-image/ as /work/src, so the postinst couldn't
reach the repo-root bin/vetting-agent.linux-amd64 — the build failed
in CI with `install: cannot stat '/work/src/bin/vetting-agent.linux-amd64'`.
The Makefile now copies the prebuilt agent into mkosi.extra/, which
mkosi merges into the image root automatically. The postinst is
reduced to creating the multi-user.target.wants symlink.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
mkosi refuses to run a non-executable postinst. git was tracking it
as 100644 because it was added from Windows (no POSIX exec bit on the
FS), so CI saw a non-executable file even though WSL/Linux had been
treating it fine locally. Same fix applied earlier to install.sh +
pxe-setup.sh.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Collapses the LXC side of PXE enablement from a six-step manual dance
(build, fetch iPXE, scp, bridge, hand-edit yaml) into:
make release # dev box (Linux/WSL)
scp bundle.tar.gz lxc:/tmp/
sudo ./install.sh # base install, unchanged
sudo ./pxe-setup.sh --interface ... --dhcp-range ... --orchestrator-url ...
pxe-setup.sh fetches iPXE from boot.ipxe.org, verifies against pinned
SHA256s in deploy/ipxe-shas.txt (fail-closed), places vmlinuz/initrd.img
from the bundle, and rewrites only the pxe: block of vetting.yaml.
Idempotent; --force gates overwriting a hand-edited block.
Adds Supervisor.Validate() — called before dnsmasq spawn — so typo'd
configs fail at orchestrator startup with clear errors naming the
missing file or yaml key, instead of silently serving broken TFTP
until a real host tries to PXE-boot. Nine tests cover missing files,
bogus interface, malformed dhcp_range, bad orchestrator_url, and
aggregate reporting.
Hypervisor bridge creation stays documented (LXC can't do it) but
everything downstream of the bridge is now scripted.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>