dnsmasq's proxy-DHCP syntax is `dhcp-range=<network-ip>,proxy[,<mask>]`,
not a CIDR. Passing "192.168.1.0/24,proxy" made dnsmasq refuse to start
with "bad dhcp-range at line 12". Parse the CIDR once in writeConf()
and render Network + Netmask as separate template fields.
The config surface (pxe.subnet) stays CIDR because that's the right
shape for humans; we just unpack it before handing to dnsmasq.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Previously the orchestrator ran a full DHCP server on a dedicated
br-vetting bridge (10.77.0.0/24), which required a hypervisor-level
bridge + physical cabling onto that bridge for every repaired host.
Real-world bite: the LXC's br-vetting had no L2 path to the target
host's PXE NIC, so DHCPDISCOVERs never reached eth1 and PXE silently
timed out.
dnsmasq's proxy-DHCP mode is the idiomatic answer: it coexists with
the LAN's existing DHCP server (UniFi, etc.), never assigns an IP
itself, and only supplements the PXE options. No dedicated bridge,
no VLAN, no cabling changes \u2014 dnsmasq binds to the LAN interface
and layers option 66/67 + the PXE BINL on top of the real DHCP
exchange. The MAC allowlist still gates replies, so random LAN
clients booting from network get nothing.
Template switches dhcp-range=<start,end,lease> to
dhcp-range=<cidr>,proxy and replaces dhcp-boot= for first-boot ROM
clients with pxe-service= directives (the correct proxy-mode
chainload form). Validation drops the dhcp_range regex for a
net.ParseCIDR check on pxe.subnet. Config, production/example yaml,
and pxe-setup.sh swap --dhcp-range for --subnet.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Without explicit dhcp-leasefile and pid-file, dnsmasq reaches for
its distro defaults (/var/lib/misc/dnsmasq.leases,
/run/dnsmasq.pid) — both outside the systemd unit's
ReadWritePaths=/var/lib/vetting /var/log/vetting sandbox, causing
'Read-only file system' on every start.
RuntimeDir is already writable by construction (Supervisor.Start
mkdir's it), so writing both files there keeps dnsmasq entirely
inside the sandbox.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Collapses the LXC side of PXE enablement from a six-step manual dance
(build, fetch iPXE, scp, bridge, hand-edit yaml) into:
make release # dev box (Linux/WSL)
scp bundle.tar.gz lxc:/tmp/
sudo ./install.sh # base install, unchanged
sudo ./pxe-setup.sh --interface ... --dhcp-range ... --orchestrator-url ...
pxe-setup.sh fetches iPXE from boot.ipxe.org, verifies against pinned
SHA256s in deploy/ipxe-shas.txt (fail-closed), places vmlinuz/initrd.img
from the bundle, and rewrites only the pxe: block of vetting.yaml.
Idempotent; --force gates overwriting a hand-edited block.
Adds Supervisor.Validate() — called before dnsmasq spawn — so typo'd
configs fail at orchestrator startup with clear errors naming the
missing file or yaml key, instead of silently serving broken TFTP
until a real host tries to PXE-boot. Nine tests cover missing files,
bogus interface, malformed dhcp_range, bad orchestrator_url, and
aggregate reporting.
Hypervisor bridge creation stays documented (LXC can't do it) but
everything downstream of the bridge is now scripted.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>