Reliability & Scale · Lesson 13
Capacity estimation (designing for N users)
Back-of-the-envelope calculation is the secret handshake of system design interviews — interviewers aren't testing your arithmetic, they're watching whether you can impose structure on uncertainty and arrive at an order-of-magnitude answer that actually guides architecture decisions.
By the end you'll be able to
- Apply the five-step derivation chain to convert a user count into QPS, storage, bandwidth, and server requirements.
- Explain why the peak multiplier matters and how the read:write ratio shapes your scaling strategy.
- Work through a complete capacity estimate for a realistic system and sanity-check the result against known resource limits.
Why back-of-the-envelope matters
Imagine a chef asked to cook for 10,000 people instead of 10. They don't need to calculate grams to the decimal place — they need to know whether the kitchen can fit the pots, whether the delivery truck can carry the ingredients, and whether the gas line can handle 50 burners running at once. Getting the order of magnitude right is everything; being off by a factor of 1.3 is irrelevant.
Capacity estimation works the same way. The goal is not a precise number — it's a number that's in the right power-of-ten bucket so that you know you need two app servers versus two hundred, and whether your storage bill is $50/month or $50,000/month. Powers of ten are your unit of measurement. An estimate within 3× of reality is an excellent estimate.
In interviews, the estimator who states their assumptions aloud and walks through a structured chain of reasoning signals that they can handle ambiguity without freezing — a critical skill for any engineer designing systems that don't exist yet.
The derivation chain: five steps from users to resources
Every capacity estimate follows the same logical spine. Each step takes the output of the previous one and produces a new quantity. Run through these in order and you can derive any resource requirement from scratch.
- Step 1 — Total registered users → DAU (daily active users). Not every registered account is active every day. A typical consumer app sees 10–30% of registered users as daily actives. For a 10M-registered-user platform, that's 1–3M DAU. Choose a single number and state it. Example: 10M registered × 10% DAU rate = 1M DAU.
- Step 2 — DAU → requests per user per day. Think about what a single user does during a typical session. Behaviors vary sharply by app type: a social feed user might make 30 read requests (scrolling, refreshing) and 2 write requests (posting, liking) per day. A URL-shortener user might make 5 redirect lookups and 0.1 URL creations per day (creating a URL every 10 days on average). Separate reads from writes — they scale differently.
-
Step 3 — Average QPS.
avg_QPS = (DAU × requests_per_user_per_day) ÷ 86,400
86,400 is the number of seconds in a day. This converts a daily total to a per-second rate. Round aggressively — this is your baseline. -
Step 4 — Peak QPS.
peak_QPS = avg_QPS × peak_multiplier
Traffic is never evenly spread. Social apps see a 3× evening prime-time spike. Flash-sale platforms see 10× during sale windows. E-commerce during Black Friday: 5–8×. Use 3× as a default for social, 2× for internal tools, 10× only if you have evidence of flash-load events. -
Step 5 — Resources from peak QPS.
Now derive each resource category independently:
- Servers:
ceil(peak_QPS / per_server_capacity)— then add 20–50% headroom. - Storage:
DAU × uploads_per_user_per_day × avg_object_size × retention_days - Bandwidth (egress):
peak_QPS × avg_response_payload_size - Cache size: estimate hot data set; if too large for RAM, cache references/metadata, not raw objects.
- Servers:
The derivation chain — visualised
Numbers worth knowing — a reference table
Estimation requires ballpark figures for what each component can handle. These are rough orders of magnitude based on typical cloud hardware and widely cited benchmarks — actual numbers vary significantly with workload type, tuning, and hardware generation.
| Resource | Typical ballpark |
|---|---|
| Commodity app server (CPU-bound API) | 500–2,000 QPS per instance |
| DB primary (Postgres / MySQL) | 1,000–5,000 QPS (reads + writes combined) |
| Redis / Memcached node | 50,000–200,000 QPS per node |
| CDN edge node | Millions of QPS (highly parallel, cacheable content) |
| 1 Gbps NIC | ~125 MB/s ≈ 1,000 req/s at 125 KB average payload |
| SSD random read | ~100K IOPS (NVMe), ~10K IOPS (SATA) |
| RAM sequential read | ~10 GB/s throughput |
| 1 TB SSD (cloud block storage) | ~$25–50 /month |
| 1 TB HDD (cloud object storage) | ~$5–10 /month (S3-class) |
| Network egress | ~$0.08–0.12 /GB (varies by provider and region) |
Fully worked example: photo-feed app at 1,000,000 DAU
Let's walk through every arithmetic step for a simplified photo-feed service (think Instagram Lite). State your assumptions first, then compute each quantity in turn.
Stated assumptions
- 1,000,000 DAU
- Each user: 20 feed reads/day, 1 photo upload every 5 days (= 0.2 uploads/day), 10 likes/day, 5 comments/day
- Average compressed photo size: 500 KB
- Feed response payload (one page of thumbnails): 2 KB
- Cache hit rate on reads: 80% (cache handles 80% of read QPS)
- Data retention: 5 years
- Peak multiplier: 3× (evening prime time for a social app)
Summary table
| Resource | Computed value | Solution |
|---|---|---|
| Peak QPS (total) | ~1,230 QPS | — |
| App servers | 3 needed → 4 with headroom | 4 × commodity instances (2 per AZ) |
| Storage (5-year raw) | 183 TB | Object storage (S3); ~366 TB with 1 replica |
| DB primary | 528 write QPS | 1 primary + 2 read replicas |
| Cache | 14.6 GB metadata | 2 × 32 GB Redis nodes |
| Peak upload ingress | 3.5 MB/s | Within a 1 Gbps NIC; no issue |
| Origin egress (reads) | 280 KB/s | Trivial; CDN absorbs 80% of reads |
Resource derivation — visualised
Under the hood: why the math works
The derivation chain is not arbitrary — each step has a mechanical justification rooted in queueing theory and traffic engineering. Understanding the why lets you adapt the formula when your system doesn't fit the defaults.
86,400: the normalization constant
A day has 86,400 seconds (60 × 60 × 24). Dividing a daily total by 86,400 converts it to a per-second rate, which is what QPS (queries per second) means. The insight is that users distribute their activity over a 24-hour window, and the time window for averaging is exactly that: one day. This works because user behavior at the level of a whole platform tends to average out across time zones, even though any individual user is bursty.
The peak multiplier: accounting for traffic shape
Real traffic follows a diurnal pattern — it peaks in the evening local time of your dominant user base and troughs overnight. If you plot hourly traffic over a week, you get a wave with peaks roughly 3–4× the overnight minimum and 2–3× the daily average. The Pareto principle applies here in time: roughly 20% of the day (about 5 hours of prime time) can carry 80% of the total daily request volume.
For flash-sale events (Black Friday, concert ticket drops), the peak-to-average ratio can reach 10–50×. Designing only for average load in these cases means your system will fail the moment it matters most.
Read:write ratio as architectural signal
Once you have separate read QPS and write QPS, their ratio tells you where to invest in scaling:
- High read:write (100:1 or higher): most requests are reads. Scale with caching (Redis, CDN), read replicas, and denormalized read models. The write path is rarely the bottleneck.
- Moderate read:write (10:1 to 100:1): typical for most consumer apps. Cache aggressively, add a few read replicas, and your single primary should hold.
- Low read:write (below 5:1): write-heavy — uncommon, but signals you need sharding, async write queues, or a write-optimized store (Cassandra, DynamoDB with write partitioning). Question your assumptions first: many "low ratio" estimates are miscounts.
Server count: ceiling, not floor
The formula ceil(peak_QPS / per_server_capacity) gives you the bare minimum to survive peak load at 100% CPU utilization — which is not survivable in practice, because a single slow request or GC pause will cascade. Add 20–50% headroom: if the math says 2.5 servers → round to 3 → add 30% headroom → 4. Deploy in pairs across availability zones for redundancy.
Storage compounding
Storage is not a snapshot — it accumulates over time. Objects uploaded in Year 1 are still stored in Year 5 unless you have an explicit deletion or tiering policy. Always compute Year-1 storage, then multiply by retention years. Add 20% overhead for metadata (database rows, indexes, thumbnails) and 2–3× for replication copies (3× for geo-redundant durability).
Checking whether our cache size is right. We said the hot-photo metadata cache is 14.6 GB. Let's verify the reasoning step by step:
Year-1 total photos: 1,000,000 users × 0.2 uploads/day × 365 days = 73,000,000 photos.
If we tried to cache the actual photo objects, top 20% = 14,600,000 × 500 KB = 7.3 TB. That's out of the question for an in-memory cache — a 64 GB Redis node holds about 0.009% of that.
Instead, cache only the metadata row: photo_id, storage_url, timestamp, like_count, comment_count ≈ 200 bytes. 14,600,000 × 200 bytes = 2.9 GB. Comfortably fits in a single Redis node. If you add user profile data, feed ranking scores, and friendship lists, you might reach 10–15 GB — still fine for a 32 GB node.
The lesson: raw objects go to object storage (S3, GCS). The cache holds references and lightweight metadata, not the objects themselves. This is the "cache-what-is-small" principle: the cache is only valuable if your hot working set fits entirely in RAM.
Sanity-check table: does this estimate pass the smell test?
After running the numbers, ask yourself whether each result falls in a plausible range. The table below captures the most common failure modes.
| Check | Red flag | Green flag |
|---|---|---|
| App server count | <1 (math is wrong) or >1,000 for a small system | 2–20 for a startup-scale system |
| Storage/year growth | >10 PB/year for <1M users — revisit object sizes or upload rate | Scales linearly with upload rate; 10s of TB/year is normal for a media app |
| Peak QPS / avg QPS ratio | >20× (unusual spike; did you model a flash sale?) | 2–5× for most social apps |
| Cache size vs. available RAM | Cache size exceeds RAM budget — you're trying to cache blobs | Fits in a single node with 50%+ headroom; or use metadata-only cache |
| DB write QPS | >10K write QPS on a single primary — imminent bottleneck | Under 5K write QPS; a single primary handles it with headroom |
| Read:write ratio | <2:1 — are you counting all read types? Most apps are read-heavy | 10:1 to 100:1 is the typical range for consumer apps |
Always state your assumptions out loud before you start computing. An interviewer who hears "I'll assume 10M registered users with a 10% DAU rate, giving 1M DAU" can correct a wrong assumption early — maybe the product is B2B and DAU rates are 60%, or the dataset is write-heavy because it's a logging system. An interviewer who hears nothing but numbers cannot tell whether you understand the domain or are just doing arithmetic. Assumptions stated out loud are an invitation to collaborate, not a sign of uncertainty.
Forgetting the peak multiplier is the single most common estimation mistake. Average QPS ÷ server capacity gives you a server count that runs at 100% CPU utilization permanently — any traffic spike, background job, or slow query will cascade into a complete outage. Computing only average load is like sizing a bridge for the average number of cars, with no margin for rush hour. Always compute peak, then add 20–50% headroom on top of that.
Round to the nearest power of ten or friendly half-power (1, 2, 5, 10, 20, 50, 100…). Estimation is about order of magnitude. An answer of "693 QPS" adds false precision and costs you time. "700 QPS, call it 1,000 with headroom" communicates reasoning, confidence, and engineering judgment. Interviewers reward judgment, not arithmetic accuracy — and in a real design, you'll provision in multiples of standard instance sizes anyway.
🧠 Quick check
A social app has 500,000 DAU. Each user makes 40 requests per day. What is the average QPS?
500,000 × 40 = 20,000,000 requests per day. Divide by 86,400 seconds: 20,000,000 ÷ 86,400 = 231.5, rounded to 232 QPS. The most common error is forgetting to divide by 86,400 (picking 2,315 QPS by dividing by only 8,640) or dividing by the wrong constant entirely.
Why do capacity estimates multiply average QPS by a peak factor of 2–10×?
Traffic is diurnal: it concentrates in evening prime time for the dominant user time zone. A 3× peak multiplier reflects this non-uniform distribution. It is not a vendor requirement or a convention — it is an empirical property of real user behavior. Flash-sale apps see 10× because all users flood in simultaneously.
You compute that your photo metadata cache needs 14 GB of RAM. You have two Redis nodes with 16 GB each. Which statement is most accurate?
14 GB total across 2 × 16 GB nodes = 7 GB per node, leaving 9 GB headroom per node — entirely comfortable. (The question says "14 GB" and "16 GB each", implying 14 GB per node; even in that reading, 2 GB headroom is tight but you'd monitor it before upgrading.) Redis has no hard 10 GB limit — it uses available RAM up to the configured maxmemory setting, which can be set to any value.
A system has a read:write ratio of 200:1. Which scaling direction does this most strongly suggest?
A 200:1 read:write ratio means reads dominate overwhelmingly. Adding caching (Redis, CDN edge caching) and read replicas cheaply absorbs read load without touching the write path. Write QPS is low by definition, so the primary database is not under pressure. Scaling both evenly wastes money; scaling writes first solves the wrong problem.
✍️ Exercise: estimate capacity for a URL shortener with 100M DAU
Given:
- 100,000,000 DAU
- Each user: 1 URL creation per week (≈ 0.14/day), 10 redirect lookups/day
- Average URL record size: 500 bytes
- Redirect response: 302 with ~200 bytes payload
- Retention: 3 years
- Peak multiplier: 5×
Compute: average and peak QPS (reads and writes separately), read:write ratio, 3-year storage, app server count (assume 1,000 QPS/server), and cache sizing for hot URLs. State one architectural recommendation that follows directly from the read:write ratio.
Model answer:
- Write QPS: 100M × 0.14 / 86,400 = 162 avg → × 5 = 810 peak write QPS
- Read QPS: 100M × 10 / 86,400 = 11,574 avg → × 5 = 57,870 ≈ 58,000 peak read QPS
- Read:write ratio: 58,000 / 810 ≈ 72:1 — extremely read-heavy; redirect caching is the primary scaling lever
- 3-year storage: 100M × 0.14/day × 365 × 3 × 500 bytes = 100M × 153.3 × 500 bytes ≈ 7.65 TB
- App servers: 58,810 total QPS / 1,000 = 58.8 → round to 64 (next power of 2 for clean load-balancer partitioning, plus headroom)
- Cache: total URLs at Year 1 ≈ 100M × 0.14 × 365 ≈ 5.1B URLs × 500 bytes = 2.5 TB total. Top 20% hot = 0.2 × 5.1B × 500 bytes = 510 GB — too large for RAM. Cache the URL record only (short code → target URL ≈ 100 bytes): 0.2 × 5.1B × 100 bytes = 102 GB → a small Redis cluster (e.g., 4 × 32 GB nodes).
- Architectural recommendation: with a 72:1 read:write ratio, the dominant cost driver is redirect reads. Cache redirect lookups at the CDN edge (e.g., Cache-Control on the 302 response). This eliminates most of the 58,000 read QPS from the origin entirely — the actual origin may only need to handle 5–10% of read QPS once the CDN warms up.
- DB: 810 write QPS is well within a single primary's capacity. At 58K read QPS, add 5–6 read replicas — or rely on CDN caching to avoid DB reads for hot URLs altogether.
Rubric: Full marks for correct order-of-magnitude answers (within ±50%), correct identification of the 72:1 read:write ratio, and a recommendation to cache redirects. Partial marks for correct read/write QPS but missing storage or cache sizing. Bonus for explicitly noting that CDN edge caching of the 302 response can absorb the majority of read QPS before it reaches the origin.
Key takeaways
- The five-step chain: Total users → DAU → Req/user/day → Avg QPS (÷86,400) → Peak QPS (×2–10×) → derive each resource independently.
- 86,400 is your normalization constant; peak = avg × 2–10× depending on traffic shape — forgetting the multiplier is the single most common estimation mistake.
- Derive each resource from the appropriate input: storage = objects × size × retention × replication, bandwidth = QPS × payload, servers = ceil(peak QPS ÷ per-server capacity) + 20–50% headroom.
- Read:write ratio is your architectural signal: high read:write (10:1 or above) → add caching and read replicas before worrying about writes; high write QPS (above ~10K) → sharding, async queues, write-optimized stores.
- In interviews: state your assumptions out loud before computing, round aggressively to friendly numbers, and sanity-check each result against known hardware limits.