API Design

Design Case Studies · Lesson 04

Design: Pub/Sub API

A publish/subscribe system lets producers fire events without knowing who's listening — but the moment you promise delivery guarantees, fan-out, and ordered processing at scale, every easy decision becomes a trade-off.

⏱ 18 min Difficulty: advanced Prereq: idempotency, webhooks, event-driven

By the end you'll be able to

1 · Requirements

An engineering team comes to you with this brief: "We need a messaging layer so services can broadcast events to whoever cares, without the sender knowing the recipients." Before touching a single endpoint, translate that into concrete requirements.

RequirementWhat it implies
Named topicsLogical channels — "order.placed", "payment.failed"
SubscriptionsEach subscriber registers interest in one topic
Fan-outOne message → N independent subscribers, each gets their own copy
Delivery guaranteesAt-least-once (vs at-most-once vs exactly-once)
ScaleHigh-throughput publish, many parallel subscribers
OrderingPer-key ordering (e.g. all events for the same order arrive in sequence)
Failure handlingRetries + dead-letter queue for unprocessable messages

Notice what is not a requirement: exactly-once delivery. That is extremely hard to guarantee at the infrastructure layer without sacrificing throughput. Practical systems instead guarantee at-least-once and ask consumers to be idempotent — the far cheaper trade-off.

✅ Nail requirements before decisions

In an interview, state requirements out loud and get agreement before naming endpoints. A candidate who asks "do we need global ordering or per-key ordering?" shows they understand the cost difference — global ordering requires a single partition and kills throughput.

2 · Design decisions

2a. Topic and subscription model

Think of a topic as a radio station — it just broadcasts. A subscription is a satellite dish pointed at that station. The broadcaster (producer) never knows how many dishes are listening, and each dish receives an independent copy of every broadcast.

Producer POST /topics/t/messages Topic order.placed log of messages Sub A — billing webhook push Sub B — inventory streaming pull Sub C — analytics webhook push
Fan-out: one published message is independently tracked per subscription. Sub B being slow does not delay Sub A.

2b. Delivery mode: webhook push vs streaming pull

How does the system get messages to subscribers? Two broad strategies:

StrategyHow it worksBest forRisk
Webhook pushSystem POSTs each message to the subscriber's HTTPS endpointLow-volume, real-time reactions (billing, notifications)Subscriber downtime causes retry storms; requires public endpoint
Streaming pullSubscriber opens a long-lived connection (SSE/gRPC stream) and messages are streamed downHigh-throughput consumers inside a datacenterConnection management complexity; back-pressure must be handled

This design supports both modes. When creating a subscription the caller chooses. See Debugging Webhooks (dbg-04) for the full webhook failure taxonomy.

2c. At-least-once delivery and idempotent consumers

Imagine you're mailing a parcel and the post office promises: "We'll keep re-sending until you confirm receipt." That's at-least-once delivery — the consumer might get the same parcel twice if the first acknowledgement was lost in transit. The parcel itself is fine; the consumer just has to be smart enough not to unbox it twice.

At-least-once is the practical default because it requires no distributed coordination. The alternative, exactly-once, demands two-phase commit semantics across producer, broker, and consumer — expensive latency overhead. See Idempotency (rel-02) for how to build idempotent consumers, and Event-driven & Pub/Sub (rel-10) for broker-level patterns.

2d. Per-key ordering

Consider a shopping cart: the sequence "add item → remove item → checkout" must be processed in that order. Per-key ordering (routing all events for the same cart ID to the same partition) achieves this while still allowing massive parallelism — all other cart IDs proceed independently. Global ordering forces a single partition and is almost never worth the throughput cost.

2e. Retries and dead-letter queue

When a subscriber's webhook returns a 5xx or times out, the message must be retried — but naively retrying immediately will hammer a struggling service. The right pattern is exponential back-off with jitter. After N failed attempts, the message is moved to a dead-letter queue (DLQ): a separate topic that operators can inspect, replay, or discard without losing the original message. See Event-driven & Pub/Sub (rel-10) for detailed DLQ design.

3 · The API model

Core endpoints

# 1. Create a topic
POST /v1/topics
# Request
{
  "name": "order.placed",
  "retention_hours": 168,
  "ordering_key": "orderId"
}
# Response 201
{
  "id": "top_8xKq",
  "name": "order.placed",
  "created_at": "2026-06-20T09:00:00Z"
}
# 2. Publish a message
POST /v1/topics/top_8xKq/messages
# Request
{
  "data": {
    "orderId": "ord_123",
    "total": 49.95,
    "currency": "USD"
  },
  "ordering_key": "ord_123",
  "idempotency_key": "pub_abc_20260620T090000"
}
# Response 202 — accepted for fan-out
{
  "message_id": "msg_7tNv",
  "status": "accepted"
}
# 3. Create a webhook subscription
POST /v1/subscriptions
# Request
{
  "topic_id": "top_8xKq",
  "name": "billing-handler",
  "delivery": {
    "type": "webhook",
    "url": "https://billing.internal/hooks/orders",
    "signing_secret": "whsec_..."
  },
  "retry_policy": {
    "max_attempts": 5,
    "backoff": "exponential",
    "initial_interval_ms": 1000
  },
  "dead_letter_topic_id": "top_dlq_billing"
}
# Response 201
{
  "id": "sub_4pLm",
  "status": "active"
}
# 4. Signed webhook delivery (system → subscriber)
POST https://billing.internal/hooks/orders  # called by the pub/sub system
X-PubSub-Signature: sha256=<hmac>
X-PubSub-Message-Id: msg_7tNv
X-PubSub-Attempt: 1
Content-Type: application/json

{
  "topic": "order.placed",
  "message_id": "msg_7tNv",
  "ordering_key": "ord_123",
  "published_at": "2026-06-20T09:00:01Z",
  "data": { "orderId": "ord_123", "total": 49.95 }
}

# Subscriber must respond 200/204 to acknowledge.
# Any other response or timeout triggers retry with back-off.
# 5. Streaming pull subscription (SSE)
GET /v1/subscriptions/sub_4pLm/messages?max_messages=10
Accept: text/event-stream

# Server streams events:
data: {"message_id":"msg_7tNv","data":{...},"ack_token":"ack_zZp9"}

# Consumer must acknowledge:
POST /v1/subscriptions/sub_4pLm/acknowledge
{ "ack_tokens": ["ack_zZp9"] }
# → 204 No Content
Broker message queue Subscriber webhook endpoint attempt 1 200 OK → ack 5xx / timeout retry with back-off Dead-Letter Queue after N failures give up
Retry flow: on 5xx or timeout the broker retries with exponential back-off. After max_attempts, the message lands in the dead-letter queue for manual inspection or replay.

4 · Evaluation & latency budget

Delivery guarantees

At-least-once is the correct default. The table below shows why exactly-once is expensive:

GuaranteeMechanismTypical overhead
At-most-onceFire and forget — no retry on failureLowest; risk of message loss
At-least-onceRetry until ack; consumer must be idempotentLow; ~10–50 ms broker overhead
Exactly-once2PC or transactional outbox; idempotent brokerHigh; 2–10× overhead; complex failure modes

Consumer idempotency

The message_id field is the natural idempotency key. Consumers should record "I have processed msg_7tNv" in their own datastore (a deduplicated events table works well) and skip re-processing. See Idempotency (rel-02) for the full implementation pattern.

Webhook latency budget

SegmentBudgetNote
Publish to broker< 5 ms p99Async enqueue; durably written before 202 returned
Broker fan-out to subscription queue< 10 ms p99In-memory copy per subscription
Webhook HTTP call (happy path)< 200 msSubscriber SLA; system times out at 30 s
Total e2e (producer → subscriber ack)< 300 ms p95Excludes retry attempts
🎯 Interview angle

Interviewers love the question: "how do you guarantee exactly-once?" The right answer is "I don't — I guarantee at-least-once delivery and make consumers idempotent, because the cost of exactly-once across a distributed system is rarely worth it." Then explain the message_id deduplication pattern. That answer distinguishes you from candidates who say "just use Kafka exactly-once semantics" without understanding the overhead.

⚠️ Common trap

Forgetting that fan-out means independent delivery state per subscription. A slow subscriber should not block a fast one. If you model this as a single queue all subscribers read from, you've lost fan-out semantics — one stuck reader holds the cursor and starves the others. Each subscription needs its own cursor or queue.

✍️ Exercise: design the DLQ replay endpoint

An operator wants to replay all messages from the dead-letter queue for subscription sub_4pLm that failed between 2026-06-01 and 2026-06-20. Design the endpoint — HTTP method, path, request body, response, and the semantic questions you need to answer before implementing.

Model answer:

POST /v1/subscriptions/sub_4pLm/dlq/replay
{
  "from": "2026-06-01T00:00:00Z",
  "to":   "2026-06-20T23:59:59Z",
  "destination_subscription_id": "sub_4pLm"  # replay to same or different sub
}
# Response 202 — async, potentially thousands of messages
{
  "replay_id": "rpl_9Xk2",
  "message_count": 47,
  "status": "in_progress"
}

Semantic questions to answer before implementing: (1) Should replayed messages get a new message_id or preserve the original? (Original is better for idempotency.) (2) Should replay re-use the existing retry policy, or succeed/fail silently? (3) Does replaying clear the DLQ entry on success?

Rubric: ✓ async 202 (never replay synchronously — too slow) ✓ preserves original message_id for idempotency ✓ time-range filter ✓ raises semantic questions about DLQ clearing and retry policy.

Under the hood: the core mechanism

A pub/sub broker is, at its core, a partitioned append-only log. Understanding what this means at the data structure level explains every delivery guarantee and ordering property in the system.

The topic as a partitioned log

A topic is divided into one or more partitions. Each partition is a durable append-only sequence of messages numbered from 0. Publishing a message appends it to the tail of one partition — the one selected by hashing the message's ordering_key. The sequence number within a partition is called the offset.

Partition 0 Partition 1 Partition 2 msg·0 msg·1 msg·2 msg·3 msg·4 appending… msg·0 msg·1 msg·2 msg·0 msg·1 Sub A (billing) — offset per partition: P0 offset=3 (delivered msg·0..2, next: msg·3) P1 offset=2 (delivered msg·0..1, next: msg·2) P2 offset=1 (delivered msg·0, next: msg·1) Sub B (analytics) — independent offsets: P0 offset=1 (delivered msg·0, next: msg·1) P1 offset=0 (nothing yet) P2 offset=0 offset = position of the NEXT undelivered message per partition per subscription
Each subscription independently tracks its offset per partition. Sub B being slow (offset 1 on P0) does not prevent Sub A (offset 3) from advancing — they share the log but never share a cursor.

Partition assignment by ordering key

When a producer publishes with "ordering_key": "ord_123", the broker computes partition = hash(ordering_key) mod num_partitions. All messages for ord_123 land in the same partition, so they are appended and delivered in order. Messages for ord_456 may land in a different partition and proceed independently — that is how per-key ordering coexists with parallel throughput.

Worked offset / redelivery trace

Follow one message through the complete lifecycle — publish, fan-out, ack, and at-least-once redelivery:

# T=0: Producer publishes
POST /v1/topics/top_8xKq/messages
{
  "data": { "orderId": "ord_123" },
  "ordering_key": "ord_123"
}
broker receives → hash("ord_123") % 4 = partition_2
append to partition_2: offset 7   # now the durable tail
→ 202 { "message_id": "msg_7tNv" }

# T=1ms: Broker fans out to each subscription's cursor queue
# Sub A (billing): P2 offset was 7 → enqueue delivery job for offset 7
# Sub B (analytics): P2 offset was 4 → Sub B is behind; msg_7tNv will be
#                    delivered when Sub B reaches offset 7

# T=10ms: Worker delivers to Sub A via webhook
POST https://billing.internal/hooks/orders
X-PubSub-Message-Id: msg_7tNv
{ "data": { "orderId": "ord_123" } }

# Case A — subscriber returns 200 OK:
broker receives ack → advance Sub A cursor: P2 offset 7 → 8
# msg_7tNv is never redelivered to Sub A

# Case B — subscriber returns 500 / times out:
broker does NOT advance cursor (offset stays at 7)
retry attempt 2 after 1 s  (initial_interval_ms=1000)
retry attempt 3 after 2 s  (exponential back-off, base=2)
retry attempt 4 after 4 s
retry attempt 5 after 8 s
# attempt 5 still fails → message_id msg_7tNv moved to DLQ topic
# Sub A cursor advances past offset 7 (message quarantined, not lost)
# Sub A continues delivering offset 8, 9, … unblocked

# Sub B is independent: still at offset 4, will eventually reach 7
# and make its own delivery attempt for msg_7tNv

The key insight: the broker never deletes the message until the retention window expires. An offset is just a pointer. Redelivery means re-reading the same log position. The subscriber's acknowledgement is what advances the pointer — not the delivery itself.

⚠️ At-least-once means offset advance must be atomic with processing

If a consumer reads a message, processes it (e.g. inserts a row in the database), and then the process crashes before it sends the ack, the broker re-delivers the same message. The consumer sees it again and processes it twice. The fix is idempotent processing: use message_id as a deduplication key and record "I have processed msg_7tNv" atomically alongside the business mutation — not in a separate step afterward. See Idempotency (rel-02).

Operating & debugging it

Most pub/sub problems are visible in two signals: subscription lag (the gap between the latest offset and the subscription's committed offset) and DLQ depth (messages that exhausted all retries). Both should be monitored as gauges with alerts.

Inspect a subscription's health

$ curl -s https://api.example.com/v1/subscriptions/sub_4pLm \ -H "Authorization: Bearer $TOKEN" | jq '{status,lag,dlq_count}' { "status": "active", "lag": { "partition_0": 0, "partition_1": 0, "partition_2": 142, ← 142 undelivered messages behind on P2 "partition_3": 0 }, "dlq_count": 3 ← 3 permanently failed messages in DLQ } # P2 lag of 142 usually means the webhook endpoint on P2's ordering-key # set is returning errors or timing out. Check retry history. $ curl -s https://api.example.com/v1/subscriptions/sub_4pLm/dlq \ -H "Authorization: Bearer $TOKEN" | jq '.messages[0]' { "message_id": "msg_7tNv", "ordering_key": "ord_123", "attempts": 5, "last_error": "POST https://billing.internal: connect timeout 30s", "moved_to_dlq_at": "2026-06-20T09:12:34Z" }

Symptom → cause → fix

SymptomLikely causeFix
Subscription lag grows monotonicallyWebhook endpoint is down, returning 5xx, or timing out consistentlyFix the subscriber endpoint; check /dlq for the last error; the lag will drain once the endpoint recovers (messages are still in the log)
DLQ depth keeps growingConsumer has a persistent bug (parse error, missing field) that never succeeds regardless of retriesInspect a DLQ message's payload and last_error; fix the consumer; replay the DLQ after fix
Consumer processes the same message twiceConsumer crashed between processing and ack, or webhook returned 200 but the broker didn't receive it (network drop)Make the consumer idempotent using message_id as a dedup key
Messages for one ordering key arrive out of orderConsumer is pulling from multiple partitions concurrently without respecting partition boundaries, or a DLQ replay bypassed orderingEnsure all messages sharing an ordering key are on the same partition; replay DLQ in offset order
Fast subscriber starved — other slow subscriber appears to block fan-outArchitecture error: subscriptions share a single queue/cursor instead of independent per-subscription stateEach subscription must maintain its own offset; a slow Sub B cannot hold back Sub A's offset
Webhook delivery rate collapses after restartRetry storm: a backlog of retries from the offline period all fire simultaneously at restartUse exponential back-off with jitter on retry scheduling; add a max-concurrency limit per subscription on the delivery workers
  1. Check subscription lag per partition — a lag spike on a single partition points to an ordering-key group, not the whole subscription.
  2. Pull the most recent entry from the DLQ; read last_error — connection refused vs 5xx vs 200-but-no-ack are three different problems.
  3. Verify the webhook endpoint manually: curl -X POST <webhook_url> with a sample payload and the signing header.
  4. Check that your consumer code commits the ack after processing completes, not before.
  5. After fixing the subscriber, use the DLQ replay endpoint to reprocess quarantined messages — preserve the original message_id for idempotent consumers.

🧠 Quick check

1. A subscriber processes the same event twice. Given pub/sub is typically at-least-once delivery, the correct fix is:

At-least-once means duplicates are expected. You can't make the network perfect, so the consumer must be idempotent — dedupe on the event id. (See the idempotency lesson.)

2. Why deliver events via webhook with retries and a dead-letter queue, instead of assuming a single delivery succeeds?

Subscribers go down and networks drop requests. Retries with backoff recover transient failures; a dead-letter queue captures what still can't be delivered for later inspection.

3. How does a pub/sub topic differ from a point-to-point queue?

A queue delivers each message to one consumer; a topic fans the same message out to all subscribers — the core of decoupled, one-to-many event delivery.

Key takeaways

Sources & further reading