pxe: switch dnsmasq to proxy-DHCP mode on the LAN
CI / Lint + build + test (push) Successful in 1m48s
Release / release (push) Successful in 2m22s

Previously the orchestrator ran a full DHCP server on a dedicated
br-vetting bridge (10.77.0.0/24), which required a hypervisor-level
bridge + physical cabling onto that bridge for every repaired host.
Real-world bite: the LXC's br-vetting had no L2 path to the target
host's PXE NIC, so DHCPDISCOVERs never reached eth1 and PXE silently
timed out.

dnsmasq's proxy-DHCP mode is the idiomatic answer: it coexists with
the LAN's existing DHCP server (UniFi, etc.), never assigns an IP
itself, and only supplements the PXE options. No dedicated bridge,
no VLAN, no cabling changes \u2014 dnsmasq binds to the LAN interface
and layers option 66/67 + the PXE BINL on top of the real DHCP
exchange. The MAC allowlist still gates replies, so random LAN
clients booting from network get nothing.

Template switches dhcp-range=<start,end,lease> to
dhcp-range=<cidr>,proxy and replaces dhcp-boot= for first-boot ROM
clients with pxe-service= directives (the correct proxy-mode
chainload form). Validation drops the dhcp_range regex for a
net.ParseCIDR check on pxe.subnet. Config, production/example yaml,
and pxe-setup.sh swap --dhcp-range for --subnet.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-04-18 12:02:49 -04:00
parent b809bf5f3e
commit 506c856046
7 changed files with 63 additions and 68 deletions
+25 -22
View File
@@ -3,9 +3,12 @@
#
# Run AFTER deploy/install.sh on the LXC (or wherever the orchestrator
# lives). Fetches pinned iPXE binaries, places the live image, and
# writes the pxe: block of /etc/vetting/vetting.yaml. Does NOT create
# the PXE bridge — that's a hypervisor-level step, see
# docs/operations.md.
# writes the pxe: block of /etc/vetting/vetting.yaml.
#
# dnsmasq runs in proxy-DHCP mode: it coexists with whatever DHCP
# server already serves your LAN (UniFi, pfSense, Asus, etc.) and
# only supplements the PXE options. No dedicated bridge, no VLAN,
# no cabling changes.
#
# Idempotent: safe to re-run with the same args. A second run with
# different args overwrites the pxe: block; pass --force to override
@@ -13,9 +16,9 @@
#
# Usage:
# sudo ./pxe-setup.sh \
# --interface eth1 \
# --dhcp-range 10.77.0.100,10.77.0.200,12h \
# --orchestrator-url http://10.77.0.2:8080
# --interface eth0 \
# --subnet 192.168.1.0/24 \
# --orchestrator-url http://192.168.1.135:8080
#
# Optional:
# --tftp-root DIR default /var/lib/vetting/tftp
@@ -26,7 +29,7 @@
set -euo pipefail
INTERFACE=""
DHCP_RANGE=""
SUBNET=""
ORCH_URL=""
TFTP_ROOT="/var/lib/vetting/tftp"
LIVE_DIR="/var/lib/vetting/live"
@@ -38,13 +41,13 @@ SERVICE_USER="vetting"
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
usage() {
sed -n '2,24p' "${BASH_SOURCE[0]}"
sed -n '2,28p' "${BASH_SOURCE[0]}"
}
while [[ $# -gt 0 ]]; do
case "$1" in
--interface) INTERFACE="$2"; shift 2 ;;
--dhcp-range) DHCP_RANGE="$2"; shift 2 ;;
--subnet) SUBNET="$2"; shift 2 ;;
--orchestrator-url) ORCH_URL="$2"; shift 2 ;;
--tftp-root) TFTP_ROOT="$2"; shift 2 ;;
--live-dir) LIVE_DIR="$2"; shift 2 ;;
@@ -61,9 +64,9 @@ if [[ $EUID -ne 0 ]]; then
exit 1
fi
[[ -z "${INTERFACE}" ]] && { echo "ERROR: --interface is required" >&2; exit 2; }
[[ -z "${DHCP_RANGE}" ]] && { echo "ERROR: --dhcp-range is required" >&2; exit 2; }
[[ -z "${ORCH_URL}" ]] && { echo "ERROR: --orchestrator-url is required" >&2; exit 2; }
[[ -z "${INTERFACE}" ]] && { echo "ERROR: --interface is required" >&2; exit 2; }
[[ -z "${SUBNET}" ]] && { echo "ERROR: --subnet is required (e.g. 192.168.1.0/24)" >&2; exit 2; }
[[ -z "${ORCH_URL}" ]] && { echo "ERROR: --orchestrator-url is required" >&2; exit 2; }
# --- sanity checks -----------------------------------------------------
@@ -73,10 +76,10 @@ if ! ip link show "${INTERFACE}" >/dev/null 2>&1; then
exit 1
fi
# "start_ip,end_ip,lease" — dnsmasq will still validate, but catch the
# obvious shape errors before we write anything to disk.
if [[ ! "${DHCP_RANGE}" =~ ^([0-9]{1,3}\.){3}[0-9]{1,3},([0-9]{1,3}\.){3}[0-9]{1,3},[^[:space:]]+$ ]]; then
echo "ERROR: --dhcp-range must be start_ip,end_ip,lease (e.g. 10.77.0.100,10.77.0.200,12h)" >&2
# CIDR shape check — dnsmasq will re-validate, but catch the obvious
# errors before we write anything to disk.
if [[ ! "${SUBNET}" =~ ^([0-9]{1,3}\.){3}[0-9]{1,3}/[0-9]{1,2}$ ]]; then
echo "ERROR: --subnet must be CIDR form (e.g. 192.168.1.0/24), got '${SUBNET}'" >&2
exit 2
fi
@@ -226,17 +229,17 @@ extract_yaml_value() {
' "${path}"
}
existing_iface="$(extract_yaml_value interface "${CONFIG}")"
existing_range="$(extract_yaml_value dhcp_range "${CONFIG}")"
existing_iface="$(extract_yaml_value interface "${CONFIG}")"
existing_subnet="$(extract_yaml_value subnet "${CONFIG}")"
if [[ -n "${existing_iface}" && "${existing_iface}" != "${INTERFACE}" && ${FORCE} -eq 0 ]]; then
echo "ERROR: pxe.interface in ${CONFIG} is already set to ${existing_iface}, which" >&2
echo " differs from --interface ${INTERFACE}. Pass --force to overwrite." >&2
exit 1
fi
if [[ -n "${existing_range}" && "${existing_range}" != "${DHCP_RANGE}" && ${FORCE} -eq 0 ]]; then
echo "ERROR: pxe.dhcp_range in ${CONFIG} is already ${existing_range}, which" >&2
echo " differs from --dhcp-range ${DHCP_RANGE}. Pass --force to overwrite." >&2
if [[ -n "${existing_subnet}" && "${existing_subnet}" != "${SUBNET}" && ${FORCE} -eq 0 ]]; then
echo "ERROR: pxe.subnet in ${CONFIG} is already ${existing_subnet}, which" >&2
echo " differs from --subnet ${SUBNET}. Pass --force to overwrite." >&2
exit 1
fi
@@ -244,7 +247,7 @@ new_block=$(cat <<EOF
pxe:
enabled: true
interface: "${INTERFACE}"
dhcp_range: "${DHCP_RANGE}"
subnet: "${SUBNET}"
orchestrator_url: "${ORCH_URL}"
tftp_root: "${TFTP_ROOT}"
live_dir: "${LIVE_DIR}"
+3 -3
View File
@@ -35,9 +35,9 @@ dispatcher:
pxe:
enabled: false
interface: "" # e.g. "eth0"
dhcp_range: "" # e.g. "10.77.0.100,10.77.0.200,12h"
orchestrator_url: "" # e.g. "http://10.77.0.1:8080"
interface: "" # LAN NIC, e.g. "eth0"
subnet: "" # LAN CIDR, e.g. "192.168.1.0/24"; proxy-DHCP scope
orchestrator_url: "" # e.g. "http://192.168.1.135:8080"
tftp_root: "" # holds ipxe.efi + undionly.kpxe
live_dir: "" # holds vmlinuz + initrd.img; served at /live/*
+3 -3
View File
@@ -33,9 +33,9 @@ dispatcher:
pxe:
enabled: false
interface: "" # e.g. "eth0"
dhcp_range: "" # e.g. "10.77.0.100,10.77.0.200,12h"
orchestrator_url: "" # e.g. "http://10.77.0.1:8080"
interface: "" # LAN NIC, e.g. "eth0"
subnet: "" # LAN CIDR, e.g. "192.168.1.0/24"; dnsmasq runs in proxy-DHCP mode scoped to this subnet, coexisting with the LAN's existing DHCP server
orchestrator_url: "" # e.g. "http://192.168.1.135:8080"
tftp_root: "/var/lib/vetting/tftp" # holds ipxe.efi + undionly.kpxe
live_dir: "/var/lib/vetting/live" # holds vmlinuz + initrd.img; served at /live/*