Replace decorative overload policy with real serving pipeline and dedicated Serving page
CI / build-and-push (push) Successful in 28s
CI / build-and-push (push) Successful in 28s
The old overload policy had dead controls (maxQueueDepth, rateLimitPerCustomer never read) and trivial flat penalties. This replaces it with a full serving pipeline where deployed models form a fleet, requests route through priority/degradation logic, and policy choices create meaningful strategic tradeoffs. New serving pipeline: fleet building from deployed models (size/quant/MoE multipliers), demand categorization by 5 priority tiers, enterprise capacity reservation, priority-ordered serving with overflow behaviors (queue/reject/degrade), auto-degradation to faster models under load, and Batch API to fill idle capacity at discounted rates. 4 new research nodes gate features progressively: Intelligent Request Routing, Priority Queue System, Request Batching, and Auto-Scaling. New dedicated Serving page with pipeline metrics, model fleet utilization, and research-gated policy controls. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -18,6 +18,7 @@ export interface ResearchBonuses {
|
||||
reputationBonus: number;
|
||||
|
||||
safetyBonus: number;
|
||||
autoScalingBonus: number;
|
||||
}
|
||||
|
||||
export function getResearchBonuses(completedResearch: string[]): ResearchBonuses {
|
||||
@@ -37,6 +38,7 @@ export function getResearchBonuses(completedResearch: string[]): ResearchBonuses
|
||||
agentsBonus: 0,
|
||||
reputationBonus: 0,
|
||||
safetyBonus: 0,
|
||||
autoScalingBonus: 0,
|
||||
};
|
||||
|
||||
for (const id of completedResearch) {
|
||||
@@ -53,6 +55,7 @@ export function getResearchBonuses(completedResearch: string[]): ResearchBonuses
|
||||
case 'pipeline_speed': bonuses.pipelineSpeedBonus += effect.value; break;
|
||||
case 'data_quality': bonuses.dataQualityBonus += effect.value; break;
|
||||
case 'sdk_coverage': bonuses.sdkCoverageBonus += effect.value; break;
|
||||
case 'auto_scaling': bonuses.autoScalingBonus += effect.value; break;
|
||||
}
|
||||
break;
|
||||
case 'capability_boost':
|
||||
|
||||
Reference in New Issue
Block a user