API Design

Production at Scale · Walkthrough 03

Read-heavy feed at 100 M+ (Twitter-style, modeled)

The defining asymmetry of a social feed is that reads dwarf writes by two orders of magnitude. Routing each write the wrong way turns that imbalance into a system-threatening wall — and a single celebrity post can trigger more cache writes than your entire infrastructure can absorb.

⏱ 20 min advanced Prereq: Caching, Load balancing

By the end you'll be able to

Why reads dominate: the 100-to-1 asymmetry

On a mature social platform, writes (new posts) and reads (timeline loads) are not symmetric. A platform with 100 million daily active users might receive 5 million new posts per day while serving 500 million timeline reads per day — a 100:1 read-to-write ratio. That gap matters architecturally because the engineering cost of a write is paid once, while the cost of making that write readable is paid by every follower who fetches their feed.

The central decision is: when do you pay that fan-out cost? At write time (pre-compute every follower's timeline) or at read time (assemble it on demand)? Both extremes break under different conditions. A hybrid pays each cost only where it is tolerable.

Fan-out on WRITE Author posts fan-out service cache[A] cache[B] cache[C] Read = 1 cache lookup Write = N cache inserts (N = follower count) Fan-out on READ Author posts post store (1 row) At read time: merge followed users' posts shard A shard B shard C Read = fan-out to M shards; Write = 1
Fan-out-on-write pre-computes every follower's timeline at post time (cheap reads, expensive writes). Fan-out-on-read stores the post once and assembles the timeline at read time (cheap writes, expensive reads).

Fan-out-on-write: the write-amplification wall

In fan-out-on-write, each new post triggers a cache insert for every follower of the author. For a typical user with 1,000 followers that is manageable. The wall appears when a celebrity posts.

Author typeFollower count (F)Cache writes per postAt 1 post/min rate
Regular user1,0001,0001,000 writes/min
Micro-influencer100,000100,000100 K writes/min
Celebrity10,000,00010,000,00010 M writes/min
Top-tier celebrity100,000,000100,000,000100 M writes/min
⚠️ The write-amplification wall

A single post from a 100-million-follower account triggers 100 million cache inserts. If each insert takes 0.1 ms, completing the fan-out serially would take 10,000 seconds — nearly three hours. Even with 100 parallel workers each handling 1 M inserts, completion takes 100 seconds. By then, hundreds of millions of followers have already requested the timeline and gotten stale data or cache misses. This is not a theoretical edge case; it is the concrete failure mode that mandates the hybrid strategy.

The ceiling is not just latency — it is also the raw throughput of the cache tier. A Redis cluster handling 500 K writes/second can sustain fan-out for users with up to roughly 500 K followers before a single post saturates it. Beyond that, concurrent celebrity posts create a write storm.

⚠️ Modeled, not measured — illustrative arithmetic

# Fan-out-on-write amplification factor
writes_per_post   = follower_count                   # F
total_writes/day  = posts_per_day × avg_follower_count  # P × F̄

# Modeled scenario: 100 M DAU platform
posts_per_day      = 5_000_000   # 5 M new posts/day (illustrative)
avg_follower_count = 200         # average follows per author (illustrative)
total_writes/day  = 5_000_000 × 200 = 1_000_000_000  # 1 B writes/day
writes/sec (avg)  = 1_000_000_000 / 86_40011_574

# One celebrity post (100 M followers) spikes this by:
spike_writes       = 100_000_000   # 100 M inserts in seconds
# That is ~8,600× the average per-second write rate, absorbed instantly
🎯 Interview angle

Interviewers test whether you recognise the asymmetry: "Why not just pre-compute all timelines?" Answer: it works for normal users, but a single celebrity post produces O(follower_count) writes. Name the number — 100 M cache writes for a top-tier account — then pivot to the hybrid as the solution. Quantifying the problem before proposing the fix is what distinguishes a senior answer.

Fan-out-on-read: the read-time assembly cost

Fan-out-on-read stores each post once in a chronological post store. When a user requests their timeline, the system fetches the M accounts they follow and merges the N most recent posts from each — a scatter-gather over M database or cache shards.

ParameterIllustrative valueNote
Accounts followed per user (M)300Typical median for an active social account
Recent posts fetched per followee10Retrieve to merge; show top 20
DB/cache round-trips per timeline read1 (scatter-gather)Sharded; parallelised
Post rows read per timeline fetch300 × 10 = 3,000Merge overhead at read time
Timeline reads/day (100 M DAU, 5 loads/day)500,000,000Illustrative
Total post-store reads/day500 M × 3,000 = 1.5 trillionEntirely impractical without caching

Pure fan-out-on-read is clearly unworkable at scale: the post-store read amplification is enormous, and the scatter-gather latency compounds with the number of followed accounts. But it has one critical advantage: a celebrity's post costs exactly one write, regardless of how many people follow them.

The hybrid strategy: push for normals, pull for celebrities

The hybrid cuts both amplification problems by segmenting authors:

  1. Normal users (follower count below threshold T): fan-out-on-write. Their posts are pushed into each follower's timeline cache immediately. Timeline reads are a single cache lookup — fast and cheap.
  2. Celebrity users (follower count ≥ T): fan-out-on-read, but only for celebrity posts. The celebrity's recent posts live in a dedicated "celebrity post cache" keyed by author id. At timeline read time, the system stitches the pre-built normal timeline (from the cache) with a lightweight fetch of any celebrities the user follows.
Post Router checks author follower count normal (F < T) Fan-out service TL cache[A] TL cache[B] TL cache[C] celebrity (F ≥ T) Celebrity post cache keyed by author_id; one row Stitch at read time
The hybrid: normal posts fan-out on write into per-user timeline caches (cheap reads). Celebrity posts write once to a celebrity post cache and are merged into the timeline only when the user actually reads it — the fetch is lightweight because there are few celebrities followed, not thousands.
⚠️ Modeled, not measured — illustrative threshold arithmetic

# Choosing the celebrity threshold T
# Goal: a single post must not saturate the write tier
# Assume: cache write throughput = 500 K writes/sec
#         acceptable fan-out duration = 10 seconds
max_writes = 500_000 × 10 = 5_000_000  # 5 M writes in 10 s
# Therefore T ≈ 1 M followers (fan-out workers × throughput)
# Accounts above T get the celebrity treatment; below T get fan-out-on-write

# Read-time stitch cost for celebrity hybrid
celebs_followed      = 5      # typical user follows ~5 celebrities (illustrative)
fetch_per_celeb      = 1      # one cache lookup per celeb
stitch_latency_ms    = 5 × 1 × 0.5 = 2.5  # <3 ms total — acceptable

Timeline cache sizing from first principles

A timeline cache stores a list of post identifiers (or lightweight post summaries) per user. The goal is to keep the hot users' timelines in memory to serve reads without hitting the post store.

ParameterIllustrative valueJustification
Active users (with cached timeline)50,000,00050 M of 100 M DAU have a warm cache entry
Timeline entries per user500~500 recent post ids; user rarely scrolls further
Bytes per entry (post id + timestamp)16 bytes8-byte int64 post id + 8-byte int64 timestamp
Raw timeline data per user500 × 16 = 8,000 bytes = 8 KB
Total raw data50 M × 8 KB = 400 GB
Redis overhead (hash + pointer)~2×Redis per-key overhead; sorted-set nodes
Total cache memory needed≈ 800 GBIllustrative
Cache nodes at 64 GB RAM each800 / 64 ≈ 13 nodesWith replication: ~26–39 nodes
✅ Store ids, not full post content

A timeline cache should store post identifiers (and perhaps scores/timestamps for ordering), not full post objects. Full post content lives in a separate post cache. This separation means updating a post only invalidates one object, not 50 M timeline cache entries. It also lets the post cache use a different eviction policy tuned to post popularity rather than recency.

The hot-key problem: celebrities as cache thundering herds

Even after adopting the hybrid strategy, a celebrity's post cache entry becomes a hot key: millions of read-time timeline stitches all hit the same cache key for the celebrity's recent posts. A single Redis node owns that key and must handle every read — this is the hot-key problem, and it is the read-side analog of write amplification. See load balancing strategies and the load balancer simulation for techniques including key sharding and local in-process caching.

⚠️ Modeled, not measured — hot-key read estimate

# Suppose celebrity X has 10 M followers, all active at the same time
# Each timeline read fetches the celebrity's post cache entry
timeline_reads_per_min  = 10_000_000 / 60167_000
# Equivalent to 167 K Redis GET ops/min on ONE key
# A single Redis node handles ~100 K–1 M simple GETs/sec — so it is manageable,
# but leaves no headroom when multiple celebrities post simultaneously.

# Mitigation 1: local in-process cache (e.g., LRU in each app server)
# Suppose 100 app servers each cache the celebrity post list for 5 seconds
redis_reads = 167_000 / 100 = 1_670  # 100× reduction — well within one node

# Mitigation 2: replicate the hot key to multiple Redis replicas and
# distribute reads across them (read replicas or key sharding)
⚠️ Cache stampede on celebrity post

When a celebrity cache entry expires (or is first populated), millions of requests simultaneously find a cache miss and all race to the post store — a thundering herd. Mitigate with: (1) probabilistic early re-population (start refreshing before expiry with probability proportional to remaining TTL); (2) a distributed lock so only one request populates the cache while others wait; (3) never fully expiring hot keys — refresh in the background on a separate worker thread.

Worked example: a single timeline read, end to end

User Alice (follows 280 normals + 3 celebrities) requests her home feed. The system executes these steps:

  1. Cache hit check. Look up timeline:{alice_id} in the Redis sorted set. The set holds 500 post-id/score pairs. HIT: read the top 20 ids. MISS: rebuild from post store (cold path, not shown).
  2. Celebrity stitch. For each of Alice's 3 celebrity follows, fetch celeb_posts:{celeb_id} from the celebrity post cache — typically 3 round-trips parallelised in a Redis pipeline, returning the last 20 posts per celebrity.
  3. Merge and rank. Merge the normal timeline (20 ids) with up to 60 celebrity post ids; sort all by timestamp descending; take the top 20.
  4. Post hydration. Batch-fetch the 20 post objects from the post content cache (or post store on miss).
  5. Return. Serialise and return the 20 posts.
⚠️ Modeled, not measured — latency budget (illustrative)

Step 1  TL cache lookup           ≈  0.5 ms   (local Redis, pipeline)
Step 2  3 celeb cache fetches      ≈  1.5 ms   (parallel, same Redis cluster)
Step 3  Merge + sort (in memory)   ≈  0.1 ms   (20+60 items, trivial)
Step 4  20 post hydrations         ≈  2.0 ms   (batch GET, cache hit assumed)
Step 5  Serialisation + response   ≈  0.5 ms
──────────────────────────────────────────────
Total (p50 warm cache)4.6 ms   (well within a 50 ms SLO)
Cold path (cache miss on TL)50–200 ms (DB fan-in + rebuild)
✅ Pipeline Redis calls; never serial-loop

The celebrity stitch makes N Redis calls (one per celebrity followed). Issuing them serially multiplies latency by N. Use a Redis PIPELINE or MGET to batch all lookups in a single round-trip. At 3 celebrities this saves ~1 ms; at 10 celebrities it saves ~5 ms — the difference between meeting and missing a 10 ms SLO.

🧠 Quick check

1. A user with 1,000,000 followers posts a tweet on a pure fan-out-on-write system. How many timeline cache writes does that trigger?

Fan-out-on-write copies the post reference into each follower's timeline cache. With 1 M followers, that is exactly 1 M cache inserts per post — the write amplification factor equals the follower count.

2. The hybrid strategy marks an author as "celebrity" (fan-out-on-read) primarily to:

The celebrity threshold exists to cap write amplification. A celebrity's post writes once to a celebrity post cache instead of fanning out to potentially hundreds of millions of follower caches. The stitch cost at read time is small because any given user follows only a handful of celebrities.

3. A timeline cache stores per-user sorted sets of 500 post ids at 16 bytes each. For 50 M active users, the raw data volume before Redis overhead is:

50 M users × 500 entries × 16 bytes = 400 GB raw. Redis overhead (hash structures, pointers, encoding) roughly doubles this to ~800 GB, requiring a modest cluster of ~13 nodes at 64 GB RAM each, plus replicas for availability.

4. Which technique most directly addresses the hot-key problem on a celebrity post cache entry?

An in-process cache on each app server means each server fetches the celebrity post list from Redis at most once per TTL interval (e.g., every few seconds), multiplying available read throughput by the number of app servers. Switching to fan-out-on-write would recreate the write amplification problem the celebrity strategy was designed to avoid.

✍️ Exercise: design the timeline API for the hybrid system

Design the GET /v1/timeline endpoint for the hybrid feed system described in this lesson. Specify: (a) the request parameters needed for cursor pagination, (b) the response schema (showing how normal posts and celebrity posts appear in the same list), (c) the steps your backend takes to build the response, and (d) what changes when the user has no cached timeline (cold path). Identify any consistency trade-offs.

Model answer

(a) Request: GET /v1/timeline?cursor=<opaque_token>&limit=20. The cursor encodes the timestamp and post-id of the last seen item, allowing the server to page forward without offset-based skew. No page number parameter — live feeds use cursor pagination because new posts shift row positions between requests.

(b) Response schema:

{
  "posts": [
    { "id": "p_1234", "author_id": "u_9", "body": "...", "created_at": "2025-06-20T10:00:00Z" }
    // celebrity and normal posts interleaved, sorted by created_at desc
  ],
  "next_cursor": "eyJ0IjoxNzUwMDE...",   // null if end of feed
  "freshness": "cached"              // "cached" | "live" — transparency hint
}

(c) Warm-cache steps: (1) fetch timeline:{user_id} sorted set (Redis ZREVRANGEBYSCORE with cursor score); (2) for each celebrity the user follows, pipeline-fetch celeb_posts:{celeb_id}; (3) merge and rank by score/timestamp; (4) batch-hydrate post objects; (5) set next_cursor from the lowest timestamp in the result set.

(d) Cold path: On cache miss, fall back to the post store: query followed users' posts from the DB (scatter-gather), rank, write the result back to the timeline cache for subsequent requests, and respond. This path has higher latency (~50–200 ms vs. ~5 ms warm).

Consistency trade-off: The pre-built normal timeline cache can be up to a few seconds stale — a post from a followed account may not appear in the timeline immediately after posting. This is acceptable (eventually consistent) for a social feed where near-real-time is sufficient. If exact ordering or freshness is required, the cache TTL should be shortened or the endpoint should always stitch from the post store.

Rubric: cursor pagination named and justified (vs offset) — 1 pt; celebrity + normal interleaved in same schema — 1 pt; cache lookup → celeb stitch → merge → hydrate steps listed — 2 pts; cold path identified with latency impact — 1 pt; consistency trade-off stated — 1 pt.

Key takeaways

Sources & further reading