API Design

Production at Scale · Simulator 01

Rate limiter at scale

A multi-tenant token-bucket limiter, one bucket per API key. Drag the total request rate up toward 100M req/s, change how many tenants share the traffic, and push a single "hot" key — and watch the busiest key's bucket drain and the 429s appear. This is the algorithm from the rate-limiting lesson, made live.

InteractiveDrag the slidersModels rel-03

Green area = tokens in the busiest key's bucket (full at top, empty at bottom). Red lines = requests dropped (429) because that key's bucket was empty.

What's happening — the math

Each API key gets its own bucket: capacity B (the burst it can spend at once), refilling at r tokens/second (its sustained allowed rate). The fleet-level numbers come straight from that, per key:

# Traffic split across K keys; one "hot" key takes hot% of it
hot_qps      = QPS × hot%
per_key_qps  = (QPS − hot_qps) / (K − 1)        # the other keys share the rest

# A key can sustain at most r req/s; the excess is throttled
allowed      = min(hot_qps, r) + min(per_key_qps, r) × (K − 1)
throttled    = QPS − allowed
rate_429     = throttled / QPS

The key insight you can see: raising total QPS doesn't throttle anyone until a single key's share crosses its own limit r. Spreading the same load over more keys (raise K) keeps everyone under the limit; concentrating it on one hot key (raise Hot key share) throttles that key while the others sail through — per-key isolation in action.

✅ Try this

1. Set QPS to 100M with K = 1,000,000 keys → per-key rate is ~100/s, under a 500/s limit → ~0% 429s even at 100M. 2. Now drag Hot key share to 50% → that one key tries ~50M/s against a 500/s limit → its bucket pins empty (solid red) and the global 429 rate jumps, even though 999,999 keys are fine. 3. Raise r (give keys a bigger allowance) or B (bigger burst) and watch the red thin out.

⚠️ Modeled, not measured

This is a first-principles model of the token-bucket algorithm, not a capture of any company's production traffic. It shows the behaviour (how throttling responds to load, key count, and hot keys) — the real limits, key counts, and traffic shapes at Stripe/AWS/etc. are not public. Treat the numbers as illustrative.

Sources & further reading