Vetting

Author	SHA1	Message	Date
josh	026923075c	pxe: disable systemd-firstboot so the live image doesn't prompt CI / Lint + build + test (push) Successful in 1m22s Details Release / release (push) Has been cancelled Details systemd-firstboot.service is an interactive wizard that asks for locale, timezone, and root password when /etc/machine-id isn't populated — i.e. every PXE boot of a mkosi-built image. It sits on sysinit.target waiting for input that will never arrive, blocking the agent service and every other downstream unit indefinitely. systemd.firstboot=off on the kernel cmdline is the documented kill switch; no image-side changes needed.	2026-04-18 15:35:24 -04:00
josh	c45349f62c	pxe: mask serial-getty@ttyS0 so hosts without serial don't wait 90s CI / Lint + build + test (push) Successful in 1m47s Details Release / release (push) Successful in 5m16s Details systemd-getty-generator reads console=ttyS0 off the kernel cmdline and auto-creates serial-getty@ttyS0.service, which BindsTo dev-ttyS0.device. On hardware without a physical serial port the device node never shows up, systemd waits its full default 90s timeout, and only then proceeds. systemd.mask= on the kernel cmdline is a first-class option — masks the unit before the generator's link even gets activated. Kernel messages still go to ttyS0 if a port is present; we just don't try to spawn a login prompt there.	2026-04-18 14:47:03 -04:00
josh	a88e24bef4	live-image: real /init + verbose boot for first-boot diagnosis CI / Lint + build + test (push) Successful in 1m23s Details Release / release (push) Successful in 4m49s Details Host boots past kernel init and then stalls silently. ACPI DSDT error about TXHC.RHUB.SS01 is benign noise (Tiger Lake firmware bug) — the actual problem is that nothing between kernel handoff and (maybe) systemd is visible on the console. Two changes: 1. Replace the /init → sbin/init symlink with a real shell script (live-image/mkosi.extra/init) that mounts /proc /sys /dev /dev/pts /dev/shm /run before execing systemd. Systemd has fallback mount code for these, but when it fails the failure is silent. Doing it explicitly in /init keeps failures visible and avoids the fragile symlink-resolution trick. 2. Drop 'quiet' from the kernel cmdline and add loglevel=7 plus systemd.log_target=kmsg + journald.forward_to_console=1 so every early-boot message reaches both tty0 and ttyS0. Will be dialed back once boot is stable. Also: .gitattributes pins LF on live-image/, .gitea/, Makefile, and *.sh so Windows checkouts don't break shell scripts and Makefile recipes with CRLF. /init also gets chmod 0755 in repack-initrd as a belt-and-braces against mode loss on non-Linux checkouts.	2026-04-18 14:31:40 -04:00
josh	2c440fce8a	pxe: move dhcp-host allowlist into a SIGHUP-reloadable file CI / Lint + build + test (push) Successful in 1m38s Details Release / release (push) Successful in 2m25s Details dnsmasq's SIGHUP re-reads /etc/ethers and any --dhcp-hostsfile= paths, but NOT dhcp-host= lines from the main conf. Reload() was faithfully rewriting dnsmasq.conf with the new MAC, sending SIGHUP, and then dnsmasq kept serving its startup view — so a freshly-registered host still showed up as "proxy-ignored, tags: eth0" with no "known" tag. Split the allowlist into ${RuntimeDir}/dhcp-hosts, referenced from the main conf via dhcp-hostsfile=. writeConf() is static-ish now; Reload just rewrites the hosts file and SIGHUPs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 12:41:27 -04:00
josh	157b70f536	pxe: split subnet into network+netmask for dnsmasq proxy-DHCP CI / Lint + build + test (push) Successful in 2m0s Details Release / release (push) Successful in 3m35s Details dnsmasq's proxy-DHCP syntax is `dhcp-range=<network-ip>,proxy[,<mask>]`, not a CIDR. Passing "192.168.1.0/24,proxy" made dnsmasq refuse to start with "bad dhcp-range at line 12". Parse the CIDR once in writeConf() and render Network + Netmask as separate template fields. The config surface (pxe.subnet) stays CIDR because that's the right shape for humans; we just unpack it before handing to dnsmasq. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 12:17:10 -04:00
josh	506c856046	pxe: switch dnsmasq to proxy-DHCP mode on the LAN CI / Lint + build + test (push) Successful in 1m48s Details Release / release (push) Successful in 2m22s Details Previously the orchestrator ran a full DHCP server on a dedicated br-vetting bridge (10.77.0.0/24), which required a hypervisor-level bridge + physical cabling onto that bridge for every repaired host. Real-world bite: the LXC's br-vetting had no L2 path to the target host's PXE NIC, so DHCPDISCOVERs never reached eth1 and PXE silently timed out. dnsmasq's proxy-DHCP mode is the idiomatic answer: it coexists with the LAN's existing DHCP server (UniFi, etc.), never assigns an IP itself, and only supplements the PXE options. No dedicated bridge, no VLAN, no cabling changes \u2014 dnsmasq binds to the LAN interface and layers option 66/67 + the PXE BINL on top of the real DHCP exchange. The MAC allowlist still gates replies, so random LAN clients booting from network get nothing. Template switches dhcp-range=<start,end,lease> to dhcp-range=<cidr>,proxy and replaces dhcp-boot= for first-boot ROM clients with pxe-service= directives (the correct proxy-mode chainload form). Validation drops the dhcp_range regex for a net.ParseCIDR check on pxe.subnet. Config, production/example yaml, and pxe-setup.sh swap --dhcp-range for --subnet. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 12:02:49 -04:00
josh	6a1d5c3bed	pxe: route dnsmasq lease + pid files into RuntimeDir CI / Lint + build + test (push) Successful in 1m39s Details Release / release (push) Successful in 2m24s Details Without explicit dhcp-leasefile and pid-file, dnsmasq reaches for its distro defaults (/var/lib/misc/dnsmasq.leases, /run/dnsmasq.pid) — both outside the systemd unit's ReadWritePaths=/var/lib/vetting /var/log/vetting sandbox, causing 'Read-only file system' on every start. RuntimeDir is already writable by construction (Supervisor.Start mkdir's it), so writing both files there keeps dnsmasq entirely inside the sandbox. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 11:31:37 -04:00
josh	a5055b3c7a	Automate PXE setup: release bundle + pxe-setup.sh + startup validation CI / Lint + build + test (push) Has been cancelled Details Collapses the LXC side of PXE enablement from a six-step manual dance (build, fetch iPXE, scp, bridge, hand-edit yaml) into: make release # dev box (Linux/WSL) scp bundle.tar.gz lxc:/tmp/ sudo ./install.sh # base install, unchanged sudo ./pxe-setup.sh --interface ... --dhcp-range ... --orchestrator-url ... pxe-setup.sh fetches iPXE from boot.ipxe.org, verifies against pinned SHA256s in deploy/ipxe-shas.txt (fail-closed), places vmlinuz/initrd.img from the bundle, and rewrites only the pxe: block of vetting.yaml. Idempotent; --force gates overwriting a hand-edited block. Adds Supervisor.Validate() — called before dnsmasq spawn — so typo'd configs fail at orchestrator startup with clear errors naming the missing file or yaml key, instead of silently serving broken TFTP until a real host tries to PXE-boot. Nine tests cover missing files, bogus interface, malformed dhcp_range, bad orchestrator_url, and aggregate reporting. Hypervisor bridge creation stays documented (LXC can't do it) but everything downstream of the bridge is now scripted. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-18 01:38:43 -04:00
josh	9bb4b09a04	Initial commit: full Phases 1-6 implementation CI / Lint + build + test (push) Has been cancelled Details Post-repair hardware validation pipeline for Proxmox cluster hosts. Go orchestrator + in-image agent + mkosi live image + bundled dnsmasq PXE + SQLite + HTMX/SSE UI + notify registry + janitor + full docs.	2026-04-17 21:32:10 -04:00

9 Commits