Design Case Studies · Lesson 15
Design: Gaming & Leaderboard API
A leaderboard is the simplest API that every engineer underestimates. The naïve version — SELECT * ORDER BY score LIMIT 10 — collapses at 50 million players. The right version fits in 2.5 GB of RAM, answers rank queries in microseconds, and rejects a cheat submission in one multiplication. This lesson builds the whole thing from requirements to latency budget.
By the end you'll be able to
- Explain why a sorted set (Redis ZSET) is the canonical leaderboard structure and what its complexity guarantees are.
- Design a server-authoritative anti-cheat check and name the HTTP status code it should return.
- Choose between WebSocket and UDP for different game transport requirements and justify the choice.
- Sketch a region-sharded leaderboard merge strategy and state the consistency trade-off it introduces.
- Size a latency budget for real-time match state at 50 ms end-to-end.
1 — Requirements
Start with what the system must do, then add the constraints that make it hard.
Functional requirements
- Submit a score: when a player finishes a game session, the client posts their score with a session token. The server validates and records it.
- Global leaderboard: return the top N players across all regions for a given game.
- Player rank lookup: given a player ID, return their exact rank among all ranked players.
- Nearby rank (contextual): return the K players ranked immediately above and below a given player — the view a player sees in their profile.
- Matchmaking: find N players of similar skill (rank bracket) to form a match.
- Real-time game state sync: during an active match, broadcast game state updates to all participants at low latency.
Non-functional requirements
- Read latency: p99 < 10 ms for top-N and rank queries.
- Idempotent score submission: one session token, one score. The same session cannot submit twice.
- Anti-cheat: the server is authoritative. Submitted scores that are physically impossible given the session duration are rejected before they touch the leaderboard.
- Match state latency: < 50 ms end-to-end for state updates during an active match.
- Scale: 50 million ranked players; millions of score submissions per day; peak load around game launches.
- Region sharding: the global leaderboard is sharded by region (NA, EU, APAC, …) and merged at read time. This caps per-shard write throughput.
2 — Design decisions
Decision 1: Sorted set as the leaderboard data structure
Think of the leaderboard as a library card catalog sorted by score — you can find your exact position without reading every card because the catalog maintains sort order as cards are inserted. The skip list under the hood is that catalog's internal index tabs.
Concretely, Redis's sorted set (ZSET) stores (player_id, score) pairs in a skip list with an auxiliary hash. Every operation that matters for a leaderboard is covered:
| Operation | Command | Complexity | Use |
|---|---|---|---|
| Insert or update a player's score | ZADD | O(log N) | Score submission |
| Get a player's rank (0-indexed from top) | ZREVRANK | O(log N) | Rank lookup |
| Get the top N players | ZREVRANGE | O(log N + M) | Global leaderboard |
| Get players in a score range | ZRANGEBYSCORE | O(log N + M) | Matchmaking |
| Get neighbors around a rank | ZREVRANGE offset K | O(log N + K) | Nearby rank view |
Memory sizing for 50 million players: each sorted set entry is approximately 50 bytes (player ID string + score float + skip list pointers). 50 M × 50 B = 2.5 GB — fits on a single Redis node with room to spare, and fits entirely in RAM. The entire leaderboard is one ZSET.
A SELECT … ORDER BY score DESC LIMIT 10 with a B-tree index on score also runs in O(log N + 10). The difference is working set size. A SQL row for a player with profile columns might be 200–500 bytes; the full scan for a rank requires touching index pages for all 50 M rows. Redis holds the entire sorted set in RAM and touches nothing else. At 50 M rows the SQL approach is fine for a batch report; it will not meet a p99 < 10 ms SLA for live queries under concurrent load.
Decision 2: WebSocket for match communication, UDP optional for shooters
Two transports are viable for real-time game state:
(a) WebSocket — full-duplex TCP connection. Works through almost every proxy and corporate firewall. Reliable delivery. Per-message overhead is a 2–14 byte frame header on top of TCP. Reconnect logic is simple. Good for: turn-based games, card games, chess, strategy, lobby and matchmaking, leaderboard push updates.
(b) UDP — connectionless, no delivery guarantee, no ordering. Per-packet overhead is 8 bytes (UDP header) vs. 20+ bytes (TCP header + WebSocket frame). When a packet is dropped, the game state corrects itself on the next frame rather than stalling waiting for a retransmit. Good for: fast-paced shooters, racing games, any game where a stale frame is better than a late frame.
Decision for this design: WebSocket for matchmaking, lobby, leaderboard real-time updates, and turn-based in-match state (e.g. chess moves). UDP for high-frequency positional state in fast-paced titles. The two are not mutually exclusive — a shooter can use WebSocket for reliable game events (player joined, match ended) and UDP for per-frame position.
Decision 3: Server-authoritative anti-cheat
The client sends actions or inputs. The server applies game logic and produces the authoritative game state. The client cannot submit a score the server did not compute.
For games where the score is computed client-side (e.g. a puzzle game with local scoring), the server applies a plausibility check before accepting the submission:
// Server-side validation pseudo-code
function validateScore(session, submittedScore) {
const maxPossible = session.durationSeconds * game.maxPointsPerSecond;
if (submittedScore > maxPossible) {
throw HttpError(422, "Score exceeds session maximum");
}
// Additional: check session token hasn't been used before
if (session.scoreSubmitted) {
throw HttpError(409, "Session already has a score");
}
}
A 30-second session with a maximum theoretical rate of 100 points/sec can produce at most 3,000 points. A submitted score of 99,999 is rejected without touching the sorted set. This is a single multiplication — effectively free.
Decision 4: Region sharding to cap write throughput
A single global sorted set at 50 M entries handles reads trivially. The bottleneck is write throughput: every score submission is a ZADD to the same Redis key. Redis is single-threaded for writes; at millions of submissions/day during a launch, a single shard becomes a write bottleneck.
The solution: maintain one sorted set per region (NA, EU, APAC, SA, …). Score submissions go to the player's home region shard. To serve a global top-N, the API fans out to all shards in parallel, takes the top N from each, and merges them. For top-10 from 5 shards, that's 50 candidates merged in O(50 log 50) — negligible.
A score submitted to the EU shard a millisecond ago may not appear in the global merged result until the next merge cycle. The global leaderboard is eventually consistent across shards. For a public leaderboard display this is fine — near-real-time is sufficient. It is not acceptable for: anti-cheat cross-region checks, financial ledgers, or anything where stale data causes a wrong action.
3 — The API model
Submit a score
POST /v1/scores
Authorization: Bearer <session_token>
// Request body
{
"player_id": "plr_42",
"game_id": "game_chess_blitz",
"score": 1847,
"session_id": "sess_xyz"
}
// 201 Created — score accepted, rank returned immediately
{
"id": "score_88",
"player_id": "plr_42",
"score": 1847,
"rank": 14203,
"submitted_at": "2024-04-12T18:30:00Z"
}
// 422 Unprocessable Entity — anti-cheat rejected
{ "error": "score_exceeds_session_maximum",
"detail": "session_duration=30s max_rate=100pts/s max_possible=3000 submitted=99999" }
// 409 Conflict — session already used
{ "error": "session_already_scored" }
Global leaderboard
GET /v1/leaderboard?game_id=game_chess_blitz&top=10
// 200 OK
{
"entries": [
{ "rank": 1, "player_id": "plr_7", "score": 9821 },
{ "rank": 2, "player_id": "plr_19", "score": 9755 },
// … 8 more entries
],
"total_players": 50000000
}
Contextual rank (nearby players)
GET /v1/leaderboard/rank?game_id=game_chess_blitz&player_id=plr_42&context=5
// context=5 returns 5 players above and 5 below
// 200 OK
{
"player": { "rank": 14203, "score": 1847 },
"above": [ /* 5 players ranked 14198–14202 */ ],
"below": [ /* 5 players ranked 14204–14208 */ ]
}
Real-time match state via WebSocket
// Connect
wss://match.example.com/v1/matches/{match_id}/state
// Client → Server: player sends a move
{ "type": "ACTION", "payload": { "move": "e2e4" } }
// Server → all clients in match: authoritative state
{
"type": "STATE",
"seq": 88,
"board": "rnbqkbnr/pppppppp/...",
"clock": { "white": 240, "black": 237 }
}
The seq field is a monotonically increasing sequence number. Clients that receive an out-of-order message (e.g. after a brief reconnect) detect the gap and can request a state resync via a separate SYNC_REQUEST message, rather than silently displaying stale state.
Strong candidates distinguish between sending state (the full board position) and sending events (the move that changed it). Events are smaller; state is self-healing after a dropped message. The right answer for a chess game is state — a dropped move means you need the full board anyway. The right answer for a real-time shooter is often delta state (only what changed). Name the trade-off.
4 — Evaluation & latency budget
Sorted set operation costs at N = 50 million
log₂(50,000,000) ≈ 25. Each ZREVRANK traverses at most ~25 skip-list levels. At Redis speeds (~0.1 µs per operation), that's ~2.5 µs. The p99 leaderboard latency is dominated by network, not the data structure. The 10 ms SLA is essentially all network budget.
ZREVRANGE for top-10: O(log N + 10) ≈ 35 operations at ~3.5 µs. Sub-millisecond at the data layer, well within the 10 ms network budget.
Anti-cheat is a single multiplication
const maxPossible = 30 * 100; // 30-second session × 100 pts/sec
// → 3000. Score 3001 is rejected. Cost: ~0 ms.
Region shard merge cost
Global top-10 from 5 region shards:
- Fan out 5
ZREVRANGE … LIMIT 10queries in parallel: ~2–5 ms (network round-trip to each shard) - Merge 50 candidates with a min-heap: O(50 log 50) ≈ 280 comparisons, < 0.1 ms
- Return top-10
Total: ~5–6 ms. Easily within the 10 ms target.
Match state latency budget (50 ms target)
| Stage | Budget | Notes |
|---|---|---|
| Client → server network | ~10 ms | Regional matchmaking keeps players geo-close |
| Server validates action | ~2 ms | Legal move check, game logic |
| Server computes new state | ~1 ms | Board evaluation, clock update |
| Broadcast to all participants | ~10 ms | WebSocket fan-out to 2–16 players |
| Server → client network (recipient) | ~10 ms | Same region |
| Total | ~33 ms | 7 ms headroom against 50 ms target |
Rate limiting (1 submission per session, 429 if exceeded) prevents replay attacks. Anti-cheat validation (score ≤ duration × max_rate, 422 if violated) prevents impossible scores. They defend different attack vectors and both are needed. A player who submits a plausible score 50 times needs rate limiting. A player who submits one impossible score needs the anti-cheat check. Confusing 422 and 429 here is a common interview error.
Mark the session token as consumed atomically with the score write — in the same Redis transaction, or in the same database row update. If the server accepts the score and then separately marks the token used, a concurrent retry arriving in the gap between those two operations will accept a second submission. Use SET key NX (set if not exists) on the session token key before accepting the score; if the key already exists, return 409 without touching the sorted set.
Exercise — Friends leaderboard
Prompt: You need to add a "friends leaderboard" — showing only the top scores among a player's 200 friends. How does your sorted set design change, and what's the new complexity?
The problem with the global ZSET: a single global sorted set cannot answer "what rank is plr_42 among just their 200 friends?" You'd have to extract all 200 friends' scores and sort client-side — 200 lookups per read. That's fine at this scale, but there are three cleaner approaches:
- (a) ZINTERSTORE at query time: maintain a sorted set of the player's friend list as keys, then use
ZINTERSTOREto compute the intersection with the global score set. Complexity: O(N × log N) where N = 200 friends. Fast enough for 200 friends; scales poorly to 10,000. - (b) Per-player friend sorted set: maintain a separate ZSET per player containing only their friends' scores. Updated on every score submission (fan-out: write to the submitter's score + to each of the submitter's friends' leaderboard ZSETs). Write cost: O(F × log F) per submission where F = friends ≤ 200. Read cost: O(log 200 + M) — very fast. Tradeoff: F×M writes per submission event. For 200 friends, manageable.
- (c) Relational query at read time:
SELECT player_id, score FROM scores WHERE player_id IN (<200 ids>) ORDER BY score DESC LIMIT 10. Simple. At 200 IDs with an index on player_id, this is fast. The tradeoff is that it bypasses the sorted-set latency advantage and requires a database round-trip per read.
Rubric: must acknowledge that the global ZSET doesn't directly solve this. Must propose at least one specific alternative with its complexity. Bonus: distinguish read-heavy vs. write-heavy workloads when choosing between (a) and (b).
Under the hood: the core mechanism
The leaderboard rests entirely on one Redis data structure: the sorted set (ZSET). A ZSET stores (member, score) pairs in two structures simultaneously: a skip list ordered by score (for range queries and rank lookups) and an auxiliary hash table (for O(1) score lookup by member). Every leaderboard operation maps to a single ZSET command.
Skip list internals — why O(log N) not O(N)
A skip list is a probabilistic data structure with multiple forward-pointer levels. Level 0 is a linked list of all entries; higher levels are express lanes that skip over many entries. At N = 50 million, the expected number of levels is log₂(50 000 000) ≈ 25. A rank lookup traverses at most ~25 pointer hops instead of scanning all 50 million entries:
# Conceptual skip list for 5 entries (compressed — real list has 25+ levels):
#
# Level 3: [head] ─────────────────────────────────── [9821/plr_7] → NIL
# Level 2: [head] ─────────────── [9430/plr_3] ────── [9821/plr_7] → NIL
# Level 1: [head] ── [1847/plr_42] [9430/plr_3] ───── [9821/plr_7] → NIL
# Level 0: [head] ── [1847/plr_42] [9430/plr_3] [9755/plr_19] [9821/plr_7] → NIL
#
# ZREVRANK plr_42: start at top level, traverse right skipping larger entries,
# drop down when next entry is smaller than target, count skipped nodes = rank.
Worked example — 6 players
Starting state: 4 players in the sorted set. Two score updates arrive. Trace each operation:
── Initial state: empty ZSET "leaderboard:game_chess_blitz" ─────
ZADD leaderboard:game_chess_blitz 9821 "plr_7" # O(log 1) → rank 1
ZADD leaderboard:game_chess_blitz 9430 "plr_3" # O(log 2) → rank 2
ZADD leaderboard:game_chess_blitz 9755 "plr_19" # O(log 3) → rank 2 (plr_3 drops to 3)
ZADD leaderboard:game_chess_blitz 1847 "plr_42" # O(log 4) → rank 4
── Current ZSET (sorted by score DESC internally for ZREV* commands): ──
# score member ZREVRANK
# 9821 plr_7 0 (rank 1 in 1-indexed display: ZREVRANK + 1)
# 9755 plr_19 1 (rank 2)
# 9430 plr_3 2 (rank 3)
# 1847 plr_42 3 (rank 4)
ZADD leaderboard:game_chess_blitz 9900 "plr_7" # plr_7 improves score; ZADD updates in place
# 9900 plr_7 0 (still rank 1, score updated)
# 9755 plr_19 1
# 9430 plr_3 2
# 1847 plr_42 3
ZADD leaderboard:game_chess_blitz 9810 "plr_42" # plr_42 jumps from 1847 to 9810
# 9900 plr_7 0 (rank 1)
# 9810 plr_42 1 (rank 2 — jumped from rank 4!)
# 9755 plr_19 2 (rank 3)
# 9430 plr_3 3 (rank 4)
── Queries ──────────────────────────────────────────────────────
ZREVRANK leaderboard:game_chess_blitz "plr_42" # → 1 (0-indexed; add 1 for display)
ZREVRANGE leaderboard:game_chess_blitz 0 2 WITHSCORES
# → ["plr_7", "9900", "plr_42", "9810", "plr_19", "9755"] (top 3)
ZREVRANGE leaderboard:game_chess_blitz 0 1 WITHSCORES # 2 above plr_42 (ranks 1–2)
ZREVRANGE leaderboard:game_chess_blitz 2 4 WITHSCORES # plr_42 and 2 below (ranks 3–5)
# Combined: contextual "nearby rank" response
Region sharding and the global rank challenge
Each region shard holds its own ZSET. Getting a player's global rank (their rank across all regions) is harder than getting their regional rank:
| Query | Single-region | Global (5 shards) |
|---|---|---|
| Top-10 leaderboard | ZREVRANGE … 0 9 — one command, O(log N + 10) | Fan out to 5 shards → top 10 each → merge 50 candidates with a min-heap, O(50 log 50). ~5 ms total. |
| A player's rank | ZREVRANK player_id — one command, O(log N) | Player is only in their home shard. Count players with higher scores in all other shards: 5× ZCOUNT … (player_score +inf, sum the counts + home shard rank. ~5 ms total. |
| Nearby rank | ZREVRANGE with offset — one command | No clean equivalent; approximate by fetching global top-N at the score boundary. Exact neighbor lookup cross-shard is not supported efficiently. |
The global rank computation uses ZCOUNT — counts members with scores in a range — which is also O(log N). Summing counts from 5 shards gives the number of players globally ranked above the target player, which is their 0-indexed global rank.
# Global rank for plr_42 (score 9810) across 5 region shards:
ZREVRANK leaderboard:NA "plr_42" # → 1 (rank 2 in NA; plr_42 is in NA shard)
ZCOUNT leaderboard:EU 9811 +inf # → 3 (3 EU players beat 9810)
ZCOUNT leaderboard:APAC 9811 +inf # → 1
ZCOUNT leaderboard:SA 9811 +inf # → 0
ZCOUNT leaderboard:ME 9811 +inf # → 0
# Global rank (0-indexed) = NA_rank + EU_count + APAC_count + SA_count + ME_count
# = 1 + 3 + 1 + 0 + 0 = 5 → global rank 6 (1-indexed)
Operating & debugging it
Leaderboard operations are rarely the bottleneck — the skip list is sub-millisecond. What goes wrong in production is: score submissions silently rejected by anti-cheat with no logging, session tokens not marked consumed (double submission), or shard divergence where regional ranks are stale relative to the global merge.
Inspecting the session token deduplication store (assuming Redis SET NX approach):
| Symptom | Likely cause | Fix |
|---|---|---|
| Score submitted, rank never updates (player still sees old rank) | ZADD succeeded but client cached the old rank; or score submission hit a different shard than the one serving reads | Verify ZSCORE shows the new score; confirm read and write both target the same shard; add cache-busting on score submission response |
| Score rejected with 422 but player believes score is legitimate | Anti-cheat rate (max_rate) set too low; session duration recorded shorter than actual due to clock skew | Log submitted, max_possible, session_duration on every 422; audit if duration is being measured from session create vs. session start |
| Player can submit score twice (409 not triggered) | Session token SET NX not atomic with the score write; or session key TTL expired before second submission | Use a Redis pipeline or Lua script to atomically SET NX + ZADD; extend session token TTL to match score window |
| Global leaderboard top-10 is stale or shows regional players above known global top | One regional shard fell behind; merge service is caching the merged result too aggressively | Check ZCARD on each shard; reduce merge cache TTL; add shard health checks to the merge fan-out |
| ZREVRANK returns (nil) for a player who did submit a score | Score was submitted to a different key (wrong game_id or region); or ZADD used wrong member string format | Confirm the exact key name used in ZADD matches ZREVRANK; check for case sensitivity or trailing whitespace in player ID |
| Leaderboard read latency spikes above 10 ms p99 | Redis is swapping to disk (maxmemory exceeded); hot key contention; slow network to Redis | Check redis-cli INFO stats | grep rdb_last for swap indicators; add a read replica for leaderboard reads; verify the sorted set is still entirely in RAM |
- Start with
ZCARDandZSCOREto confirm the data is there and correct before investigating application logic. - For 422 false positives: add structured logging of all anti-cheat inputs (session duration, max rate, submitted score) so you can audit individual rejections.
- For double-submission bugs: the session token SET NX and the ZADD must be atomic — use a Lua script or a Redis transaction (
MULTI/EXEC). - For global rank inconsistencies: query each shard individually to confirm scores are present; then re-run the ZCOUNT fan-out manually to verify the merge math.
- For memory issues:
MEMORY USAGE leaderboard:keygives exact bytes; if it exceeds 3 GB on a 4 GB instance, plan horizontal sharding or player expiry (remove inactive players' entries with a TTL or a separate cleanup job).
🧠 Check your understanding
What Redis data structure gives O(log N) rank queries and O(log N + M) range queries for a leaderboard with 50 million entries?
Redis sorted sets maintain elements ordered by score using a skip list, giving O(log N) for insertions, rank lookups, and score updates. A hash map has no inherent ordering; a list has O(N) rank lookup.
A player submits a score of 85,000 points for a 60-second session where the maximum possible score rate is 500 points/sec. What should the server return?
60 s × 500 pts/s = 30,000 max possible. A score of 85,000 is physically impossible in that session — the server should reject it as invalid (422), not rate-limit it (429). Rate limiting applies when the request count is too high; 422 applies when the data in the request is invalid.
During an active chess match, which transport is most appropriate for real-time move exchange?
Chess moves are bidirectional (both players send and receive) and latency-sensitive. WebSocket provides full-duplex communication with lower overhead than polling. SSE is unidirectional — the client cannot send moves over it without a separate HTTP channel.
The global leaderboard is built by merging top-N from 5 regional shards. What consistency trade-off does this introduce?
Shards are eventually consistent with each other. A score submitted to the EU shard a millisecond ago won't instantly appear in the global merge. For a leaderboard this is acceptable — near-real-time is fine. The trade-off is drastically reduced write contention.
Key takeaways
- A Redis sorted set stores 50 M players in ~2.5 GB of RAM and answers rank queries in microseconds — the entire leaderboard fits in a single data structure.
- Server-authoritative anti-cheat is one multiplication: if
score > session_duration × max_rate, return 422. It costs essentially zero compute. - Use WebSocket for reliable, bidirectional match communication. Consider UDP only when a dropped frame is genuinely better than a late frame (fast-paced shooters).
- Region sharding caps write throughput and introduces eventual consistency in the global view — an acceptable trade-off for a display leaderboard, not for financial or anti-cheat systems.
- A friends leaderboard is a different problem from a global leaderboard: the global ZSET doesn't solve it directly. Choose between query-time intersection, per-player fan-out writes, or relational query based on your read/write ratio.
Sources & further reading
- Redis sorted sets — the canonical leaderboard data structure, official documentation
- Redis leaderboard patterns — engineering deep-dive on production leaderboard design
- WebSocket API — MDN reference for the transport used in match state sync
- 1500 Archers on a 28.8: Game Networking in Age of Empires — the classic paper on UDP-based game networking, still worth reading
- Skip list — the data structure underlying Redis sorted sets; understanding it makes the O(log N) complexity intuitive