Fix Fill Capacity exceeding DC slot limit due to double-counted failed racks
CI / build-and-push (push) Successful in 38s
CI / build-and-push (push) Successful in 38s
computeRacksFailed was incremented on production failure and never decremented when repaired racks came back online, while repair cohorts also tracked the same racks. This caused usedSlots to inflate past the DC capacity over time. Fix: derive computeRacksFailed from repair cohorts each tick instead of maintaining it as a running counter. Include repair cohorts in pipeline slot accounting so all racks are counted exactly once. Also fixes power limit in fillDCToCapacity to only count online racks (pipeline racks don't draw power). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -113,7 +113,7 @@ function DeploymentProgressBar({ dc }: { dc: DataCenter }) {
|
||||
const tierConfig = DC_TIER_CONFIGS[dc.tier];
|
||||
const maxCompute = maxComputeRacks(tierConfig.rackSlots);
|
||||
const pipelineRacks = dc.deploymentCohorts.filter(c => c.stage !== 'decommission').reduce((s, c) => s + c.count, 0);
|
||||
const totalTarget = dc.computeRacksOnline + dc.computeRacksFailed + pipelineRacks;
|
||||
const totalTarget = dc.computeRacksOnline + pipelineRacks;
|
||||
const pct = totalTarget > 0 ? (dc.computeRacksOnline / totalTarget) * 100 : 0;
|
||||
|
||||
if (totalTarget === 0 && dc.status === 'operational') return null;
|
||||
@@ -642,8 +642,8 @@ function DataCenterDetailView({ clusterId, campusId, datacenterId }: {
|
||||
|
||||
const tierConfig = DC_TIER_CONFIGS[dc.tier];
|
||||
const maxCompute = maxComputeRacks(tierConfig.rackSlots);
|
||||
const existingCompute = dc.computeRacksOnline + dc.computeRacksFailed
|
||||
+ dc.deploymentCohorts.filter(c => c.stage !== 'decommission').reduce((s, c) => s + c.count, 0);
|
||||
const pipelineCount = dc.deploymentCohorts.filter(c => c.stage !== 'decommission').reduce((s, c) => s + c.count, 0);
|
||||
const existingCompute = dc.computeRacksOnline + pipelineCount;
|
||||
const availableSlots = maxCompute - existingCompute;
|
||||
const sku = dc.rackSkuId ? RACK_SKU_CONFIGS[dc.rackSkuId] : null;
|
||||
const netSlots = networkSlotsRequired(existingCompute);
|
||||
@@ -814,7 +814,7 @@ function DataCenterDetailView({ clusterId, campusId, datacenterId }: {
|
||||
) : (
|
||||
<>
|
||||
<p className="text-sm text-surface-400">
|
||||
Retrofit swaps all {dc.computeRacksOnline + dc.computeRacksFailed} <span className="text-surface-200">{sku!.name}</span> racks to a new SKU.
|
||||
Retrofit swaps all {dc.computeRacksOnline + pipelineCount} <span className="text-surface-200">{sku!.name}</span> racks to a new SKU.
|
||||
The DC goes offline during retrofit.
|
||||
</p>
|
||||
<div className="space-y-2">
|
||||
@@ -906,7 +906,7 @@ function DataCenterDetailView({ clusterId, campusId, datacenterId }: {
|
||||
{confirmRetrofit && (
|
||||
<ConfirmModal
|
||||
title="Confirm Retrofit"
|
||||
message={`This will decommission all ${dc.computeRacksOnline + dc.computeRacksFailed} ${sku?.name} racks and install ${RACK_SKU_CONFIGS[confirmRetrofit].name}. The DC will go offline during this process.`}
|
||||
message={`This will decommission all ${dc.computeRacksOnline + pipelineCount} ${sku?.name} racks and install ${RACK_SKU_CONFIGS[confirmRetrofit].name}. The DC will go offline during this process.`}
|
||||
confirmLabel="Start Retrofit"
|
||||
onConfirm={() => { retrofitDC(datacenterId, confirmRetrofit); setConfirmRetrofit(null); }}
|
||||
onCancel={() => setConfirmRetrofit(null)}
|
||||
|
||||
Reference in New Issue
Block a user