Fix Fill Capacity exceeding DC slot limit due to double-counted failed racks

computeRacksFailed was incremented on production failure and never decremented when repaired racks came back online, while repair cohorts also tracked the same racks. This caused usedSlots to inflate past the DC capacity over time. Fix: derive computeRacksFailed from repair cohorts each tick instead of maintaining it as a running counter. Include repair cohorts in pipeline slot accounting so all racks are counted exactly once. Also fixes power limit in fillDCToCapacity to only count online racks (pipeline racks don't draw power). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-25 00:00:46 -04:00
parent 9c49a10b31
commit b7d23c8872
4 changed files with 28 additions and 22 deletions
@@ -71,11 +71,15 @@ function resetRackFailures() {
      ...cluster,
      campuses: cluster.campuses.map((campus) => ({
        ...campus,
-        dataCenters: campus.dataCenters.map((dc) => ({
-          ...dc,
-          computeRacksOnline: dc.computeRacksOnline + dc.computeRacksFailed,
-          computeRacksFailed: 0,
-        })),
+        dataCenters: campus.dataCenters.map((dc) => {
+          const repairCount = dc.deploymentCohorts.filter(c => c.stage === 'repair').reduce((sum, c) => sum + c.count, 0);
+          return {
+            ...dc,
+            computeRacksOnline: dc.computeRacksOnline + repairCount,
+            computeRacksFailed: 0,
+            deploymentCohorts: dc.deploymentCohorts.filter(c => c.stage !== 'repair'),
+          };
+        }),
      })),
    }));
    return { infrastructure: { ...s.infrastructure, clusters: newClusters } };