A. Highlights
What we build:
A Complexity Classifier that scores every incoming query on a [0.0–1.0] scale using token count, vocabulary entropy, and keyword heuristics
A Hybrid Compute Router that sends simple queries to
gemini-2.0-flash(SLM tier) and complex ones togemini-1.5-pro(GFM tier) — with semantic cache as the zero-cost first layerA Real-time Cost Ledger backed by SQLite that tracks per-request spend, model tier, latency, and cumulative savings vs. a naive all-GFM baseline
A Budget Enforcement Engine with configurable daily caps that blocks GFM calls and downgrades to SLM when spend limits approach
A React Cost Dashboard with live Recharts visualizations: spend by tier, route decision histogram, savings-vs-baseline gauge, and tail-latency charts
Connection to L71 (Runtime Guardrails & Security): The FastAPI gateway inherited from L71 runs every request through the Guardrails AI validation pipeline before the router sees it. A guardrail violation short-circuits the pipeline immediately — eliminating the cost of routing and inference on policy-violating content. This is cost optimization at the earliest possible stage.
Enables L73 (A/B Testing & Canary Deployments): The routing layer we build here is the exact traffic-splitting substrate L73 needs. The
route_decisionfield logged to the cost ledger becomes the experiment assignment record; the cost metadata becomes the primary metric. You’ll add a thin experiment config on top of this router — no architecture changes required.
Preparing for a distributed systems interview?
→Download the free Interview Pack
→ Subscribe now to access source code repository - 200 + coding lessons


