Replace decorative overload policy with real serving pipeline and dedicated Serving page
CI / build-and-push (push) Successful in 28s
CI / build-and-push (push) Successful in 28s
The old overload policy had dead controls (maxQueueDepth, rateLimitPerCustomer never read) and trivial flat penalties. This replaces it with a full serving pipeline where deployed models form a fleet, requests route through priority/degradation logic, and policy choices create meaningful strategic tradeoffs. New serving pipeline: fleet building from deployed models (size/quant/MoE multipliers), demand categorization by 5 priority tiers, enterprise capacity reservation, priority-ordered serving with overflow behaviors (queue/reject/degrade), auto-degradation to faster models under load, and Batch API to fill idle capacity at discounted rates. 4 new research nodes gate features progressively: Intelligent Request Routing, Priority Queue System, Request Batching, and Auto-Scaling. New dedicated Serving page with pipeline metrics, model fleet utilization, and research-gated policy controls. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -433,6 +433,48 @@ export const TECH_TREE: ResearchNode[] = [
|
||||
effects: [{ type: 'unlock_product_line', target: 'agents-platform', value: 1 }],
|
||||
},
|
||||
|
||||
// === SERVING INFRASTRUCTURE ===
|
||||
{
|
||||
id: 'request-routing',
|
||||
name: 'Intelligent Request Routing',
|
||||
description: 'Route requests to optimal model size/variant. Unlocks routing strategy and per-tier rate limits.',
|
||||
era: 'scaleup',
|
||||
category: 'efficiency',
|
||||
prerequisites: ['inference-optimization'],
|
||||
cost: { researchPoints: 2, compute: 25, ticks: 150 },
|
||||
effects: [{ type: 'unlock_feature', target: 'request-routing', value: 1 }],
|
||||
},
|
||||
{
|
||||
id: 'priority-queues',
|
||||
name: 'Priority Queue System',
|
||||
description: 'SLA-aware scheduling with granular priority controls. Unlocks priority ordering and overflow policies.',
|
||||
era: 'scaleup',
|
||||
category: 'efficiency',
|
||||
prerequisites: ['request-routing'],
|
||||
cost: { researchPoints: 3, compute: 30, ticks: 180 },
|
||||
effects: [{ type: 'unlock_feature', target: 'priority-queues', value: 1 }],
|
||||
},
|
||||
{
|
||||
id: 'request-batching',
|
||||
name: 'Request Batching',
|
||||
description: 'Group inference requests for higher throughput. Unlocks Batch API product line at 50% discount.',
|
||||
era: 'scaleup',
|
||||
category: 'efficiency',
|
||||
prerequisites: ['inference-optimization'],
|
||||
cost: { researchPoints: 2, compute: 20, ticks: 120 },
|
||||
effects: [{ type: 'unlock_feature', target: 'request-batching', value: 1 }],
|
||||
},
|
||||
{
|
||||
id: 'auto-scaling',
|
||||
name: 'Auto-Scaling Infrastructure',
|
||||
description: 'Dynamically reallocate compute during demand spikes. +20% effective capacity headroom.',
|
||||
era: 'bigtech',
|
||||
category: 'efficiency',
|
||||
prerequisites: ['request-routing'],
|
||||
cost: { researchPoints: 4, compute: 60, ticks: 300 },
|
||||
effects: [{ type: 'efficiency_boost', target: 'auto_scaling', value: 0.2 }],
|
||||
},
|
||||
|
||||
// === DATA ===
|
||||
{
|
||||
id: 'data-pipeline',
|
||||
|
||||
Reference in New Issue
Block a user