API Design

Interview Prep · Lesson 04

Mock interview prompts

The best preparation for a design interview is repeating the format under time pressure. Here are eight prompts — five design problems and three debugging scenarios — each with a model approach so you can check your own reasoning.

⏱ self-paced Difficulty: advanced Prereq: prep-01, prep-02, prep-03

By the end you'll be able to

How to practise

Passive reading kills the value of a question bank. Do this instead:

  1. Time-box it. Set a 20-minute timer for each prompt. Design interviews are roughly 45 minutes with context-setting and Q&A, leaving about 25–30 minutes for the actual design. Practice shorter to build urgency.
  2. Narrate out loud. Say your reasoning aloud, even if you're alone. The interviewer scores your thinking process, not just the final diagram. If you stumble while speaking, that's the gap to fix — not the diagram.
  3. Write the key endpoints first. Method, path, request/response shape — before you touch scaling or cross-cutting concerns. An accurate API model is the foundation everything else rests on.
  4. Score yourself after. Use the rubric from Lesson 03 — did you cover all seven dimensions at mid-level or above?
  5. Re-do weak prompts. One good repetition is worth five passive reads of the answer.
Pick prompt 20-min timer Narrate aloud Check answer Score rubric Repeat
The loop that makes practice effective: time pressure → narration → comparison → gap identification → repeat.

Design prompts

MEDIUM Design a rate limiter API (20 min)

Context: See Lesson rel-03 for the full reliability deep-dive.

Clarifying questions to ask first: Per-user or per-IP? Sliding window or fixed window? What response when limited — 429 with Retry-After, or silent queue? Should clients be able to inspect their quota?

Core API model:

# Check quota (lightweight — called per request)
GET  /v1/rate-limits/{key}
← 200 { "limit": 1000, "remaining": 412, "reset_at": "2026-01-01T12:01:00Z" }

# When limit exceeded — returned by the proxied resource, not this API429 Too Many Requests
Retry-After: 38
{ "error": "rate_limit_exceeded", "limit": 1000, "reset_at": "…" }

Key trade-off: Fixed-window counters are O(1) in Redis but allow a 2× burst at window boundaries. Sliding-window log is accurate but O(n) per request. Token-bucket sits in between — smooth, O(1) with two stored values — and is the right default for most APIs. Name the algorithm and its cost; picking any one with a reason is the mid/senior signal.

Cross-cutting: The 429 response must include Retry-After and reset_at; without them, clients can't back off intelligently. Auth scoping: per API key is better than per IP (proxies and NATs share IPs).

MEDIUM Design a URL shortener API (20 min)

Context: See Lesson cs-05 for the full case study.

Clarifying questions: Vanity slugs or random only? Link expiry? Per-user analytics? QPS on redirect vs. creation — typically 1 000:1.

Core API model:

POST /v1/links
Body: { "url": "https://example.com/very/long/path",
         "slug": "docs24",  // optional vanity
         "expires_at": "2027-01-01T00:00:00Z" }
← 201 { "id": "lnk_9a2", "short_url": "https://sho.rt/docs24" }

GET /v1/links/{id}/stats
← 200 { "clicks": 14830, "unique_ips": 9210 }

# Redirect path — not an API endpoint, but must be stated
GET https://sho.rt/docs24
← 301/302 Location: https://example.com/very/long/path

Key trade-off: 301 (permanent) lets browsers cache the redirect — great for bandwidth, bad for analytics and link rotation. 302 (temporary) keeps every click hitting your server — costlier but necessary if you track clicks or need expiry. Name both and justify the choice.

Scale note: Redirect is read-heavy and hot-cacheable. The slug-to-URL mapping fits in Redis; the creation write path is the narrow critical path. Collision on random slug requires a retry loop — mention it.

HARD Design an idempotent payments API (25 min)

Context: See Lesson cs-12 for the Stripe case study and Lesson rel-02 for the idempotency deep-dive.

Clarifying questions: Card present or card-not-present? Synchronous vs. async settlement? Partial captures? Refund life-cycle?

Core API model:

POST /v1/payment-intents
Idempotency-Key: order_8c3a-attempt-1
Body: { "amount": 4999, "currency": "usd",
         "payment_method": "pm_xyz",
         "confirm": true }
← 201 { "id": "pi_01", "status": "requires_capture" }

# Retry with same key → same response, no double charge
POST /v1/payment-intents   // same Idempotency-Key200 { "id": "pi_01", "status": "requires_capture" }

POST /v1/payment-intents/pi_01/capture
← 200 { "status": "succeeded" }

Key trade-off: Idempotency key scope — per-client-generated UUID vs. per order/attempt. A key per payment-method + amount lets you deduplicate accidental retries; a key per order allows one charge per order regardless of retries. The right answer depends on the business rule. Naming this distinction is the senior signal here.

Failure scenario to address: Network drops after the card network confirms but before your 201 returns. The client retries with the same key — your backend finds the stored result, returns it, no double charge. This is the whole point of idempotency key storage on the server side.

MEDIUM Design a chat API (20 min)

Context: See Lesson cs-08 for the full Messenger case study.

Clarifying questions: 1:1 only or group chats? Delivery receipts? Message editing/deletion? Media attachments? How many concurrent connections?

Core API model (REST + WebSocket):

# REST — durable storage operations
POST /v1/conversations
Body: { "participant_ids": ["usr_a", "usr_b"] }
← 201 { "id": "conv_1" }

POST /v1/conversations/conv_1/messages
Body: { "text": "hello", "idempotency_key": "msg_k1" }
← 201 { "id": "msg_7", "sent_at": "…" }

GET /v1/conversations/conv_1/messages?before=msg_7&limit=50
← 200 { "data": […], "cursor": "msg_3" }

# WebSocket — real-time delivery
ws://api.example.com/v1/ws?token=bearer_xyz
// server pushes: { "type": "new_message", "message": { … } }

Key trade-off: Long-polling vs. WebSocket vs. SSE. Long-polling is the simplest — no persistent connection state — but adds latency and load. WebSocket is low-latency but stateful; you need sticky sessions or a shared pub/sub layer (Redis Streams, Kafka) to fan out across app servers. SSE is unidirectional (server-to-client only), so sends still need REST. Name all three and justify WebSocket as the right choice for two-way chat.

Pagination note: Cursor-based (by message ID or timestamp) is essential — offset pagination on an append-only log shifts under the client as new messages arrive.

MEDIUM Design pagination for a huge activity feed (20 min)

Context: See Lesson cs-13 for the Twitter/feed case study.

Clarifying questions: Reverse-chronological or ranked? Write-heavy (many new posts) or read-heavy (many followers)? SLA on freshness?

Core API model:

GET /v1/feed?cursor=eyJpZCI6MTIzfQ&limit=25
← 200 {
    "data": [ { "id": "post_99", … }, … ],
    "next_cursor": "eyJpZCI6NzR9",  // null when end of feed
    "has_more": true
  }

Why not offset? LIMIT 25 OFFSET 500 on a 10 M-row feed table requires the DB to scan 525 rows and discard 500. Worse, if 3 new posts arrive while the client is paginating, every subsequent offset page is off by 3 — users see duplicates or skip posts. Cursor encodes a position (usually last-seen ID or timestamp) and the DB uses an index seek, not a scan.

Key trade-off: Cursor-based pagination is stable and O(log n) per page but is forward-only — no "jump to page 12." If business requirements need jump-to-page (e.g., search results), offset is unavoidable; you accept the instability and add dedup logic on the client. Name both; explain why feed = cursor, search = possibly offset.

Debugging prompts

MEDIUM "We started getting 504s after last night's deploy" (15 min)

Context: See Lesson dbg-02 for systematic error reading and Lesson dbg-04 for downstream timeout patterns.

What the interviewer is testing: Do you narrow the search space methodically, or do you guess?

Key questions to ask:
1. What changed in the deploy? (new service? config? dependency version?)
2. Are all requests 504 or only some endpoints / paths?
3. Where does the timeout originate — load balancer, API gateway, upstream service?
4. Did p99 latency on downstream calls increase before the 504s started?
5. Can you roll back? If so, do the 504s resolve? (confirms blast radius)

Model approach:

  1. Check the gateway/LB timeout config — did the deploy change the timeout value or a dependency whose response time increased past it?
  2. Compare p50/p99 latency on the downstream call before and after deploy in your observability tool. A latency increase without a timeout config change is the most common root cause.
  3. Check for a missing DB index on a query that started running against a larger dataset post-deploy.
  4. Look for a synchronous call added in the new code that used to be async — e.g., a new third-party API call in the request path with a 30 s timeout.

Key trade-off statement to make: "I'd instrument the new code path before rolling back so we understand whether it's a correctness issue or a latency issue — rollback might fix symptoms but obscure root cause."

HARD "Duplicate charges are being reported by customers" (20 min)

Context: See Lesson dbg-05 for the idempotency debugging playbook.

What the interviewer is testing: Do you understand at-most-once vs. at-least-once semantics and where idempotency breaks down?

Questions to narrow scope:
1. Which customers? All or a pattern (mobile app users, users on slow connections)?
2. Are the duplicate charges milliseconds apart (double-click) or minutes apart (retry)?
3. Is the client sending an idempotency key? Is it the same key on both charges?
4. Did we recently change the retry logic on the client?
5. Is the payment processor reporting two authorisations or one? (tells us if duplication is client-side or processor-side)

Model approach:

  1. Pull the charge IDs for a duplicate pair. Check whether both carry the same idempotency key — if yes, the key lookup is broken. If no, the client is generating a new key on retry (the most common bug).
  2. Check client-side retry logic: many mobile clients generate a new UUID per request rather than per order. Fix: scope the idempotency key to the cart/order ID, not the HTTP request.
  3. If keys are identical but duplicates still occur, check your key-storage layer — is it behind a read replica with replication lag? A write that commits then reads from a stale replica looks "not found" and creates a second charge.
  4. Immediate mitigation: pause automated retries; manually refund confirmed duplicates; add an alert on charge-count-per-order > 1.

Key trade-off: Idempotency key per HTTP request vs. per business operation. One is safe only if the client never retries; the other is always safe. The right answer is always per business operation (order, quote, cart).

MEDIUM "A partner is suddenly getting 429s from our API" (15 min)

Context: See Lesson dbg-06 for the 429 debugging playbook.

What the interviewer is testing: Can you distinguish a configuration problem from a genuine abuse/scale problem, and can you collaborate with a partner rather than just blocking them?

First questions:
1. Did our rate limit config change? (new deployment, quota reduction?)
2. Did the partner's traffic pattern change? (new feature, new batch job, clock-sync burst?)
3. Is the 429 per-key, per-IP, or per-endpoint? (determines scope)
4. Are they respecting Retry-After in the 429 response, or retrying immediately?

Model approach:

  1. Pull the partner's request-count-per-minute graph. Look for a sudden step up — new batch job or polling loop that started at a fixed time is the most common cause.
  2. Check whether they're sending requests in a burst at the start of each minute (clock-synchronised cron) rather than spread evenly. This hits fixed-window limits even if their total is under quota.
  3. Recommend jitter on their retry/polling interval as an immediate fix. Longer term: offer a webhook/push model so they don't need to poll.
  4. If their legitimate traffic genuinely outgrew their quota: raise the quota with SLA agreement, or provide a bulk endpoint so 1 call replaces N polling calls.

Key trade-off statement: "Hard-blocking a partner with no communication erodes trust. The right response is to surface the rate-limit headers so they can self-diagnose, then offer a path to a higher quota or a more efficient API shape."

🎯 Interview angle

Debugging prompts are as common as design prompts in senior API interviews. The interviewer wants to see systematic narrowing — not a list of every possible cause, but a prioritised, evidence-driven elimination sequence. The strongest answers start with "what changed?" and "which customers / which endpoints?" before proposing any hypothesis. Jumping straight to a fix is the junior signal; building a minimal reproducible scope is the senior one.

⚠️ Common trap

Treating the model answers as scripts to memorise. The point of this bank is to stress-test your process, not load you with pre-baked answers. An interviewer who has heard your exact words before will probe past them immediately. Understand the reasoning chain so you can reconstruct the answer under novel variants.

✅ Do this, not that

Do attempt each prompt under the time limit before reading the answer. Don't skim the prompt, immediately read the answer, and count that as practice — it builds false confidence while teaching you nothing about working under pressure.

Key takeaways

Sources & further reading