Replace decorative overload policy with real serving pipeline and dedicated Serving page
CI / build-and-push (push) Successful in 28s

The old overload policy had dead controls (maxQueueDepth, rateLimitPerCustomer never read)
and trivial flat penalties. This replaces it with a full serving pipeline where deployed
models form a fleet, requests route through priority/degradation logic, and policy choices
create meaningful strategic tradeoffs.

New serving pipeline: fleet building from deployed models (size/quant/MoE multipliers),
demand categorization by 5 priority tiers, enterprise capacity reservation, priority-ordered
serving with overflow behaviors (queue/reject/degrade), auto-degradation to faster models
under load, and Batch API to fill idle capacity at discounted rates.

4 new research nodes gate features progressively: Intelligent Request Routing, Priority
Queue System, Request Batching, and Auto-Scaling. New dedicated Serving page with pipeline
metrics, model fleet utilization, and research-gated policy controls.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-04-25 12:42:09 -04:00
parent d7d77238b9
commit 901db02a6b
17 changed files with 1349 additions and 229 deletions
+42
View File
@@ -433,6 +433,48 @@ export const TECH_TREE: ResearchNode[] = [
effects: [{ type: 'unlock_product_line', target: 'agents-platform', value: 1 }],
},
// === SERVING INFRASTRUCTURE ===
{
id: 'request-routing',
name: 'Intelligent Request Routing',
description: 'Route requests to optimal model size/variant. Unlocks routing strategy and per-tier rate limits.',
era: 'scaleup',
category: 'efficiency',
prerequisites: ['inference-optimization'],
cost: { researchPoints: 2, compute: 25, ticks: 150 },
effects: [{ type: 'unlock_feature', target: 'request-routing', value: 1 }],
},
{
id: 'priority-queues',
name: 'Priority Queue System',
description: 'SLA-aware scheduling with granular priority controls. Unlocks priority ordering and overflow policies.',
era: 'scaleup',
category: 'efficiency',
prerequisites: ['request-routing'],
cost: { researchPoints: 3, compute: 30, ticks: 180 },
effects: [{ type: 'unlock_feature', target: 'priority-queues', value: 1 }],
},
{
id: 'request-batching',
name: 'Request Batching',
description: 'Group inference requests for higher throughput. Unlocks Batch API product line at 50% discount.',
era: 'scaleup',
category: 'efficiency',
prerequisites: ['inference-optimization'],
cost: { researchPoints: 2, compute: 20, ticks: 120 },
effects: [{ type: 'unlock_feature', target: 'request-batching', value: 1 }],
},
{
id: 'auto-scaling',
name: 'Auto-Scaling Infrastructure',
description: 'Dynamically reallocate compute during demand spikes. +20% effective capacity headroom.',
era: 'bigtech',
category: 'efficiency',
prerequisites: ['request-routing'],
cost: { researchPoints: 4, compute: 60, ticks: 300 },
effects: [{ type: 'efficiency_boost', target: 'auto_scaling', value: 0.2 }],
},
// === DATA ===
{
id: 'data-pipeline',