Heartbeat-first dispatch: retire WoL-as-default, add WaitingReboot
CI / Lint + build + test (push) Has been cancelled

Every supported host runs vetting-reporter in-OS and heartbeats every
30s. WoL was never the thing that started vetting — the heartbeat
response's reboot_for_vetting command was. Firing WoL first only
crowded the run log with misleading diagnostics when the real failure
mode is "reporter isn't installed."

- StartRun 409s if the host hasn't heartbeated within 60s, pointing
  the operator at /register/quick.sh.
- Dispatcher re-checks LastSeenAt at dispatch time (run may sit in
  Queued long enough for the host to go offline); stale hosts mark
  the run Failed with failed_stage=dispatch instead of looping.
- New StateWaitingReboot + TriggerRebootCommanded capture the actual
  semantics. StateWaitingWoL kept as the hook point for a future
  manual-override button.
- Tile disables the Start button with a quick.sh tooltip when the
  host is offline, matching the server-side 409.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-04-18 01:10:34 -04:00
parent c9927ca2bf
commit d0bfae14c8
17 changed files with 632 additions and 155 deletions
+13
View File
@@ -76,6 +76,19 @@ func (r *Runs) MarkFailed(ctx context.Context, runID int64, failedStage, holdIP
return err
}
// MarkDispatchFailed records a terminal failure discovered before the run
// ever reached a live image, e.g. the dispatcher refused to start because
// the host isn't heartbeating. Goes to StateFailed (not FailedHolding)
// because there's no live image to ssh into.
func (r *Runs) MarkDispatchFailed(ctx context.Context, runID int64, failedStage, result string) error {
now := time.Now().UTC()
_, err := r.DB.ExecContext(ctx, `
UPDATE runs SET state = ?, result = ?, failed_stage = ?, completed_at = ?
WHERE id = ?
`, string(model.StateFailed), result, failedStage, now, runID)
return err
}
func (r *Runs) MarkCompleted(ctx context.Context, runID int64, reportPath string) error {
now := time.Now().UTC()
_, err := r.DB.ExecContext(ctx, `