Design Case Studies · Lesson 00
The case-study framework
Before you draw a single box or write a single endpoint, you need a method. This lesson gives you a repeatable eight-step sequence that turns any open-ended "design the API for X" prompt into a structured answer — the same method the three case studies that follow will use.
By the end you'll be able to
- Name the eight steps and explain why each must come before the next.
- Distinguish functional requirements, non-functional requirements, and scale assumptions — and know which questions expose each.
- Explain what the four-section structure of each following case study tests, and use it to evaluate any API design you encounter.
Why a method matters more than memorising endpoints
An interviewer who asks "design the Search API for an e-commerce platform" is not testing whether you know the exact path /v1/search. They're testing whether you can navigate ambiguity under time pressure. The prompt is deliberately open-ended. Two candidates who both pick GET /v1/search?q=shoes are not equivalent if one arrived there by asking clarifying questions and the other just guessed.
A method gives you something to fall back on when you don't know the answer yet — which is almost always at the start of a design question. It also signals, loudly, that you understand how real systems get built: requirements first, endpoints later.
The fastest way to differentiate yourself in a system design interview is to pause and ask questions before drawing anything. Most candidates go straight to boxes and arrows. Saying "Before I start, I'd like to clarify scale and a few requirements" takes thirty seconds and instantly signals senior thinking. Interviewers frequently mark this as the single biggest differentiator.
The eight steps
Think of these as a pipeline: each step narrows the space of possible designs so the next step has fewer wrong turns to explore.
- Clarify requirements and scope. Ask what the system must do (functional) and what constraints it must satisfy (non-functional). Don't assume anything the problem doesn't state explicitly. Five minutes of clarification prevents thirty minutes of designing the wrong thing. Typical questions: "Is this read-heavy or write-heavy?", "Do we need real-time updates or is eventual consistency fine?", "Is this internal or a public API?"
- Nail the scale assumption. Get a concrete QPS (queries per second) or DAU (daily active users) number, even an order-of-magnitude estimate. Scale shapes every choice that follows: whether you need a cache, whether a single database suffices, whether synchronous responses are feasible. A read API at 100 QPS and one at 100 000 QPS have almost nothing in common.
- Identify the entities. Name the nouns the API operates on — users, orders, files, messages. These become your resources. Resist jumping to endpoints; getting the entity model right first means your endpoints will have natural, stable paths.
- Map operations to HTTP methods and paths. For each entity, ask which of the four core operations apply: create, read, update, delete. Then choose the right HTTP method and a consistent path. This is where RESTful maturity levels apply.
- Sketch request and response shapes. Write out representative bodies. Use real field names, types, and formats. This is the actual contract. Reveal every design decision you've made: what's optional, what's required, what the error envelopes look like. Use the design method cheatsheet as a reference here.
- Define errors and edge cases. What happens when the resource doesn't exist? When the caller is unauthorised? When input is malformed? Every production API has error semantics — name them explicitly rather than hoping the happy path covers everything.
-
Apply cross-cutting concerns.
Six concerns touch almost every API regardless of domain. Work through them in order:
- Auth — who can call which endpoint?
- Rate limiting — what's the per-caller ceiling? (Lesson rel-03)
- Idempotency — which mutations are safe to retry? (Lesson rel-02)
- Pagination — how are large collections returned? (Lesson perf-04)
- Versioning — how do you ship breaking changes without breaking callers? (Lesson rel-01)
- Caching — what can be cached, where, and for how long? (Lesson rel-07)
- Build the latency budget and evaluate. Walk one key request end-to-end: network round-trip, auth overhead, cache lookup, database query, serialisation. Estimate each segment. Check whether the sum fits the p99 target from step 2. If it doesn't, iterate on step 7.
How to ask the right clarifying questions
Step 1 is the step most candidates skip. Here is a concrete taxonomy of what to ask and why each category matters:
| Category | Example question | Why it matters |
|---|---|---|
| Functional scope | "Does the search need to support filters, or just free-text?" | Determines entities, endpoints, and query complexity. |
| Non-functional / SLA | "What's the acceptable p99 latency?" | Drives caching strategy and architecture choices. |
| Scale | "Roughly how many requests per second at peak?" | Single DB vs. read replicas vs. cache tier vs. CDN. |
| Consistency | "Does a write have to be immediately readable everywhere?" | Sync vs. async, eventual vs. strong consistency. |
| Caller identity | "Internal service or third-party developers?" | Auth model, rate limits, versioning strategy. |
| Existing constraints | "Are there upstream systems we must not change?" | May force protocol or data shape choices. |
Treating "non-functional requirements" as a formality you state once and forget. Non-functional requirements are the constraints that force you to make interesting trade-offs. Every time you consider adding a cache, sharding a database, or making an operation async, you are resolving a non-functional requirement. Refer back to them explicitly when explaining why you made a particular decision — it shows your reasoning, not just your conclusion.
The four-part structure every following case study uses
Each of the three case studies in this module is written in the same four sections. Understanding that structure now means you can use it as a checklist when you encounter any design question — in an interview, in a code review, or when reading someone else's API spec.
| Section | What it answers | Maps to step(s) |
|---|---|---|
| Requirements | Functional capabilities, non-functional constraints, and a concrete scale / QPS assumption. | ① ② |
| Design decisions | The key trade-offs and why — protocol, data shape, sync vs. async, consistency model. Explicitly names what was rejected and why. | ③ ④ ⑦ |
| The API model | Concrete endpoints with real request and response examples. The contract, not the architecture. | ⑤ ⑥ |
| Evaluation & latency budget | Walk one key request end-to-end. Apply caching, pagination, idempotency, rate limiting. Check whether the result meets the requirements from section 1. | ⑧ |
The four sections are a scaffold, not a rigid script. In a real interview you'll interleave them — stating a requirement then immediately making a design decision to address it. The value is that the structure ensures you cover all four areas, even if you don't cover them in strict order. The most common mistake is spending all available time on "The API model" and never reaching "Evaluation".
A worked micro-example: URL shortener
To ground the method before you meet the full case studies, here is the eight steps applied to a deliberately simple problem — a URL shortener. The point is to see the method run, not to study the domain in depth.
-- ① REQUIREMENTS --
Functional: shorten a URL, redirect via short code, optional custom alias
Non-functional: p99 redirect < 50 ms, 99.9% availability
Scale: 5 000 writes/day, 500 000 reads/day ≈ 6 reads/s average, 60 reads/s peak
-- ② ENTITIES --
Link { id, short_code, target_url, created_at, expires_at }
-- ③ ENDPOINTS --
POST /v1/links → 201 { short_code, short_url }
GET /v1/links/:code → 301 Location: target_url
DELETE /v1/links/:code → 204 (idempotent)
-- ④ CROSS-CUTTING --
Auth: API key on writes; none on reads (public redirect)
Rate limit: 100 writes/min per key
Idempotency: POST /v1/links is idempotent on target_url (same URL → same code)
Caching: GET /v1/links/:code — cache at CDN edge, TTL = expires_at - now
-- ⑤ LATENCY BUDGET (redirect path) --
CDN cache hit: ~5 ms total — done
CDN miss + DB: CDN(5) + network(10) + DB read(5) + redirect(2) = 22 ms ✓ < 50 ms
What the three following case studies cover
Each case study picks a domain where the interesting design problems come from a different part of the method:
- cs-01 — Search API: the problem is scale and read latency. The key moves are cursor pagination, cache-on-popular-query, and eventual consistency of the index.
- cs-02 — File upload/download API: the problem is large payload handling. The key move is offloading bytes to object storage via presigned URLs so the API server never touches the data.
- cs-03 — Comment & rating API: the problem is write contention and count consistency. The key moves are denormalized counters, idempotent like/unlike, and separating the read and write paths.
After each case study, practice closing the loop: restate the original requirements, then show that your final design satisfies each one. This "requirements traceability" step takes one minute and leaves the interviewer with a clear sense that you didn't drift from the problem. It's the single most skipped step — doing it reliably makes you stand out.
Under the hood: a fully worked design — bookmarking API
The eight steps become intuitive through repetition. Here they are applied completely to a second example — a bookmarking API — so you can see every decision made explicitly. The product brief: "Let users save URLs and retrieve them, 1 M users, ~50 bookmarks each on average, reads far outweigh writes."
-- ① CLARIFYING QUESTIONS --
Are bookmarks user-scoped only, or can they be shared/public?
→ User-scoped. Shapes auth model and index design.
Any tagging, full-text search, or folder hierarchy?
→ Tags only (no folders). Avoids recursive structures.
Mobile-heavy? Offline sync needed?
→ Mobile-heavy. Keep payloads lean; no offline sync in v1.
Acceptable latency for the list endpoint?
→ p99 < 100 ms at the client.
-- ② SCALE ASSUMPTION --
1 M users × 50 bookmarks avg = 50 M rows total
Writes: ~5 bookmarks/day/active user × 100k DAU = 500 writes/s peak
Reads: ~10x write rate = 5 000 reads/s peak (read-heavy, not read-dominated)
Single-user list is bounded: max ~50 k bookmarks (power user) → pagination required
-- ③ ENTITIES --
Bookmark {
id: string (opaque, e.g. bkm_7f3k),
url: string (unique per user — prevents duplicates),
title: string (user-editable or auto-fetched),
tags: string[] (max 10 tags),
created_at: timestamp
}
Ownership: every bookmark row has a user_id FK; no cross-user access.
-- ④ OPERATIONS → METHODS + PATHS --
POST /v1/bookmarks → 201 | 200 (idempotent on url)
GET /v1/bookmarks → 200 paginated list
GET /v1/bookmarks/:id → 200 single bookmark
PATCH /v1/bookmarks/:id → 200 update title / tags
DELETE /v1/bookmarks/:id → 204 (idempotent)
-- ⑤ REQUEST / RESPONSE SHAPES --
POST /v1/bookmarks
Request: { "url": "https://example.com/article", "title": "...", "tags": ["api","design"] }
Response: { "id": "bkm_7f3k", "url": "...", "title": "...", "tags": [...], "created_at": "..." }
GET /v1/bookmarks?tag=api&cursor=eyJjcmVhdGVk...&limit=25
Response: {
"data": [ { ...bookmark... }, ... ],
"next_cursor": "eyJjcmVhdGVk..." | null,
"total_estimate": 312
}
-- ⑥ ERRORS + EDGE CASES --
POST with duplicate url → 200 (return existing bookmark, not 409)
POST with url > 2048 chars → 422 Unprocessable Entity
GET /bookmarks/:id where id belongs to another user → 404 (not 403 — don't reveal existence)
DELETE /bookmarks/:id already deleted → 204 (idempotent — not 404)
-- ⑦ CROSS-CUTTING CONCERNS --
Auth: Bearer token required on all endpoints. user_id from token, never from body.
Idempotency: POST is idempotent on (user_id, url). Same URL → return existing bookmark.
DELETE is unconditionally idempotent (204 even if already gone).
Pagination: Cursor on (created_at DESC, id). Tag filter is a B-tree index condition.
Offset rejected: O(offset) scan on 50 M rows is prohibitive at page 1000.
Rate limit: 100 writes/min per user (anti-abuse). 600 reads/min per user.
Caching: List endpoint: Cache-Control: private, max-age=30. (Private — user-specific data.)
Single bookmark: Cache-Control: private, max-age=300.
CDN cannot cache private responses; caching is client-side or a per-user edge cache.
Versioning: /v1/ prefix. Breaking changes → /v2/ with a migration path.
-- ⑧ LATENCY BUDGET: GET /v1/bookmarks (page 1, tag=api) --
Scenario A — client cache hit (private max-age=30 not expired):
Client reads from memory: ~0 ms Total: ~0 ms ✓
Scenario B — cache miss, warm read replica:
Network RTT (client → API): ~15 ms
Auth token verification: ~3 ms (cached public key)
DB query (index on user_id, tag, created_at DESC): ~8 ms
Serialise 25 bookmarks: ~2 ms
Network RTT (API → client): ~15 ms
———————
Total p99: ~43 ms ✓ < 100 ms target
Index required: CREATE INDEX ON bookmarks(user_id, tag, created_at DESC, id);
Without this index, the DB does a full user-table scan → p99 spikes to >500 ms at 50 M rows.
The most common sequencing mistake is choosing a pagination strategy before deciding what the list is sorted by. If you add cursor pagination on created_at DESC but later change the default sort to "most recently visited" (a different field), the entire index is wrong and every cursor ever issued is invalid. Decide the sort order as part of step ④ when you map operations — write it into the contract — then design the cursor and index to match. Changing the sort after pagination is in production is a breaking change.
Operating & debugging it
A running bookmark API leaves observable traces at every layer. Here is how to read them.
| Symptom | Likely cause | Fix |
|---|---|---|
| List endpoint returns same page of results regardless of cursor | Cursor is being ignored — parameter name mismatch or server not reading it | Log the raw cursor param on the server; confirm the DB query includes WHERE (created_at, id) < (:cursor_ts, :cursor_id) |
| Same bookmark appears on two consecutive pages | Sort order has ties (two bookmarks created at the same second) and cursor encodes only created_at, not the ID tie-breaker | Add id as the second sort key and encode both in the cursor |
| POST /v1/bookmarks creates duplicates for the same URL | Idempotency check on (user_id, url) is missing or the unique index doesn't exist | Add UNIQUE(user_id, url) to the table and return the existing bookmark on conflict |
| p99 list latency spikes above 500 ms for users with many bookmarks | Query is doing a full-table scan because the index doesn't cover the tag filter alongside user_id | Add a composite index: CREATE INDEX ON bookmarks(user_id, tag, created_at DESC, id) |
| DELETE /bookmarks/:id returns 404 on retry | Server returns 404 for already-deleted resources instead of the idempotent 204 | Handle the "not found" case in DELETE as a 204 — the desired state (resource absent) is already achieved |
Debug checklist:
- Confirm the auth token is valid and the
user_idit encodes matches the expected user — mismatched IDs silently return empty lists. - Decode the cursor (base64 decode) and verify it encodes both the sort field and the ID tie-breaker; a cursor missing the tie-breaker causes pagination gaps.
- Run
EXPLAIN ANALYZEon the list query to confirm index use — anIndex Scanwith the composite index should show cost < 10; aSeq Scanindicates a missing or unused index. - Check
X-RateLimit-Remainingin the response — if it is 0 before the first write in a session, the rate limiter bucket may have leaked or been misconfigured. - For duplicate-creation bugs: query the DB directly for
SELECT id, url, created_at FROM bookmarks WHERE user_id = :uid AND url = :urlto count rows; more than one row confirms the unique constraint is missing.
🧠 Quick check
1. You're asked to "design the notification API." Your first move should be:
Jumping to endpoints before clarifying requirements is the single most common mistake. The first move is always to ask questions that narrow the design space.
2. Which section of the four-part case-study structure explicitly names rejected alternatives?
The "Design decisions" section is where trade-offs live — what you chose, what you explicitly rejected, and why. Naming rejected alternatives is how you demonstrate that you considered the full space rather than arriving at the answer by habit.
3. A candidate spends the entire interview sketching endpoints and never discusses latency. Which step are they missing?
The latency budget (step ⑧) is where you validate that your design actually meets the non-functional requirements you stated. Skipping it means you've designed something without checking whether it works at the required scale.
4. "The API must handle 50 000 reads per second with p99 under 100 ms" is an example of a:
Non-functional requirements define constraints — latency, throughput, availability, consistency — rather than capabilities. They're the requirements that drive architectural decisions like caching and horizontal scaling.
✍️ Exercise: run the method on a bookmark API
A product team asks you to "design an API that lets users save and retrieve bookmarks." Work through all eight steps before reading the model answer. Focus on step ① (clarifying questions you'd ask) and step ⑦ (which cross-cutting concerns apply and why).
Model answer — highlights:
-- ① CLARIFYING QUESTIONS --
- User-scoped or shared? (changes auth model and entity ownership)
- Any tagging, folders, or full-text search on saved pages?
- Mobile-heavy? (pagination strategy, payload size)
- Expected scale: 1M users × 50 bookmarks avg = 50M rows
-- ③ ENTITIES --
Bookmark { id, url, title, tags[], created_at } (owned by user)
-- ④ ENDPOINTS --
POST /v1/bookmarks → 201 Bookmark
GET /v1/bookmarks → 200 { data: [...], next_cursor }
GET /v1/bookmarks/:id → 200 Bookmark
DELETE /v1/bookmarks/:id → 204 (idempotent)
-- ⑦ CROSS-CUTTING --
Auth: Bearer token; only owner can read/write own bookmarks
Idempotency: POST with same url → return existing (no duplicates)
Pagination: Cursor on created_at DESC for list endpoint
Rate limit: 100 writes/min per user (anti-abuse)
Caching: Per-user lists: short TTL (stale if user adds on another device)
Rubric: ✓ asked at least three clarifying questions ✓ identified a per-user ownership model ✓ applied all six cross-cutting concerns ✓ chose cursor pagination and explained why (not offset; large offsets are slow on 50M rows) ✓ noted idempotency on POST (same URL → same bookmark, no duplicates). Full marks = five checkmarks.
Key takeaways
- Drive every API design question through the same eight steps: clarify → scale → entities → operations → shapes → errors → cross-cutting → evaluate.
- Functional requirements say what the system does; non-functional requirements say how well it does it. Both are required before any endpoint is written.
- The six cross-cutting concerns (auth, rate limiting, idempotency, pagination, versioning, caching) apply to almost every API. Work through them explicitly even if the answer is "not needed here."
- The latency budget (step ⑧) is the validation step — it's where you show that your design actually satisfies the non-functional requirements you stated.
- The four-part case-study structure (Requirements → Design decisions → The API model → Evaluation) is the same structure you should use in any interview or design document.
Sources & further reading
- API Design course — Design method cheatsheet (companion reference; use alongside this lesson)
- RFC 9110 — HTTP Semantics (authoritative definitions of safe and idempotent methods)
- Stripe — Idempotent Requests (production example of Idempotency-Key in an API contract)
- Google — API Design Guide (resource-oriented design principles used at scale)
- Google SRE Book — Service Level Objectives (how non-functional requirements become measurable targets)