Production at Scale · Simulator 02

Cache hit ratio at scale

A cache in front of your origin is the highest-leverage latency lever you have. Move the hit-ratio slider from 0 → 99% and watch effective latency collapse — even a slow cache beats going to origin most of the time. Then push QPS into the millions and see how the origin is shielded. This is the model behind the caching lesson, made live.

InteractiveDrag the slidersModels rel-07

Top bars: "no cache" vs "with cache" latency, scaled to origin latency. Bottom bar: how much of total QPS hits the origin (red) vs is served from cache (green).

What's happening — the math

Every request is served either from cache (probability h) or falls through to origin (probability 1 − h). The effective latency is the weighted average:

eff_latency  = h × L_cache + (1 − h) × L_origin

origin_QPS   = QPS × (1 − h)          # only misses reach origin
origin_load  = 1 − h                   # fraction of total traffic
speedup      = L_origin / eff_latency  # how many times faster vs no cache

# Worked example: h = 0.9, L_cache = 1 ms, L_origin = 50 ms
eff_latency  = 0.9 × 1 + 0.1 × 50  =  0.9 + 5.0  =  5.9 ms
origin_load  = 1 − 0.9             =  10%          # origin sees only 1 in 10 requests
speedup      = 50 / 5.9            ≈  8.5×

The nonlinearity is the key insight: going from 90% → 99% hit rate halves origin load again (10% → 1%) and cuts effective latency another ~5×. The last 1% of misses still dominates tail latency when L_origin is large — keep origin fast even behind a good cache.

✅ Try this

1. Set QPS to 10M, hit rate 90%, L_origin 200 ms → origin sees 1M req/s, eff latency ~21 ms. 2. Raise hit rate to 99% → origin drops to 100k req/s, eff latency ~3 ms — same origin capacity now handles 10× more total QPS. 3. Now drag L_cache up to 10 ms → latency creeps back up; a slow cache kills the benefit. Keep your cache fast and your hit rate high.

⚠️ Modeled, not measured

This is a first-principles model of cache latency, not a capture of any company's production traffic. It assumes steady-state hit ratio, no cold-start, no cache stampedes, and no replication lag. Real systems see lower effective hit rates during deploys, TTL expirations, and traffic spikes. Treat the numbers as illustrative.

Sources & further reading

Caching strategies (the lesson this models)
AWS — Caching best practices · Redis client-side caching
Facebook — Scaling Memcache (NSDI 2013)