Heartbeat-first dispatch: retire WoL-as-default, add WaitingReboot
CI / Lint + build + test (push) Has been cancelled

Every supported host runs vetting-reporter in-OS and heartbeats every
30s. WoL was never the thing that started vetting — the heartbeat
response's reboot_for_vetting command was. Firing WoL first only
crowded the run log with misleading diagnostics when the real failure
mode is "reporter isn't installed."

- StartRun 409s if the host hasn't heartbeated within 60s, pointing
  the operator at /register/quick.sh.
- Dispatcher re-checks LastSeenAt at dispatch time (run may sit in
  Queued long enough for the host to go offline); stale hosts mark
  the run Failed with failed_stage=dispatch instead of looping.
- New StateWaitingReboot + TriggerRebootCommanded capture the actual
  semantics. StateWaitingWoL kept as the hook point for a future
  manual-override button.
- Tile disables the Start button with a quick.sh tooltip when the
  host is offline, matching the server-side 409.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-04-18 01:10:34 -04:00
parent c9927ca2bf
commit d0bfae14c8
17 changed files with 632 additions and 155 deletions
+2 -2
View File
@@ -25,7 +25,7 @@ type PipelineNode struct {
// pre-stage timestamps.
var preStageOrder = []model.RunState{
model.StateQueued,
model.StateWaitingWoL,
model.StateWaitingReboot,
model.StateBooting,
}
@@ -37,7 +37,7 @@ func runStateRank(s model.RunState) int {
order := []model.RunState{
model.StateRegistered,
model.StateQueued,
model.StateWaitingWoL,
model.StateWaitingReboot,
model.StateBooting,
model.StateInventoryCheck,
model.StateSpecValidate,