Replace decorative overload policy with real serving pipeline and dedicated Serving page
CI / build-and-push (push) Successful in 28s

The old overload policy had dead controls (maxQueueDepth, rateLimitPerCustomer never read)
and trivial flat penalties. This replaces it with a full serving pipeline where deployed
models form a fleet, requests route through priority/degradation logic, and policy choices
create meaningful strategic tradeoffs.

New serving pipeline: fleet building from deployed models (size/quant/MoE multipliers),
demand categorization by 5 priority tiers, enterprise capacity reservation, priority-ordered
serving with overflow behaviors (queue/reject/degrade), auto-degradation to faster models
under load, and Batch API to fill idle capacity at discounted rates.

4 new research nodes gate features progressively: Intelligent Request Routing, Priority
Queue System, Request Batching, and Auto-Scaling. New dedicated Serving page with pipeline
metrics, model fleet utilization, and research-gated policy controls.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-04-25 12:42:09 -04:00
parent d7d77238b9
commit 901db02a6b
17 changed files with 1349 additions and 229 deletions
+1 -1
View File
@@ -48,7 +48,7 @@ import {
import { INITIAL_RIVALS } from '@ai-tycoon/game-engine';
export type ActivePage = 'dashboard' | 'infrastructure' | 'research' | 'models'
| 'market' | 'talent' | 'data' | 'competitors' | 'finance' | 'achievements' | 'leaderboard' | 'settings';
| 'market' | 'serving' | 'talent' | 'data' | 'competitors' | 'finance' | 'achievements' | 'leaderboard' | 'settings';
export type InfraNavLevel = 'clusters' | 'cluster' | 'campus' | 'datacenter';