API Design

Reliability & Scale · Lesson 10

Event-driven & Pub/Sub

Not every service call needs an immediate answer. Asynchronous messaging — queues and publish/subscribe topics — lets producers and consumers evolve, scale, and fail independently, trading synchronous convenience for resilience and decoupling at scale.

⏱ 14 min Difficulty: advanced Prereq: Idempotency, REST basics

By the end you'll be able to

The problem with always waiting for an answer

When a user places an order on an e-commerce site, at least five things need to happen: deduct inventory, charge the payment, send a confirmation email, update a search index, and log for fraud analysis. If the HTTP handler does all five inline and waits for each to finish before responding, two bad things follow: the user stares at a spinner for several seconds, and one slow or unavailable downstream service (say, the email provider is rate-limiting you) blocks the entire transaction.

The alternative is to do the one thing only the user is waiting for — create the order record — respond immediately with 202 Accepted, and publish an event that other services can consume in their own time. The services are now decoupled: they don't need to be online at the same instant, and a spike of orders buffers in the queue rather than overwhelming the email service.

Queue vs pub/sub topic

Two patterns for asynchronous delivery differ in how many consumers receive each message:

A queue (point-to-point) delivers each message to exactly one consumer. Workers compete to pull jobs — when Consumer A picks up a message, B and C never see it. This is ideal for work distribution: resize an image, send one email, process one payment. The message is gone once processed.

A pub/sub topic fans a message out to all subscribers simultaneously. When an "order.placed" event is published, the inventory service, the email service, the fraud service, and the analytics pipeline all receive their own copy independently. Adding a new subscriber requires zero changes to the producer — pure decoupling. This is the model Kafka topics, AWS SNS, and Google Pub/Sub use.

Queue (point-to-point) Producer Queue [msg][msg][msg] Consumer A ← picks up Consumer B idle Pub/Sub topic (fan-out) Producer Topic order.placed Inventory Email Svc Analytics all 3 get their own copy
Queue: one consumer wins each message. Topic: every subscriber gets its own independent copy of the event.

Benefits of async messaging

Delivery guarantees

How hard does the messaging system try to deliver each message? Three levels exist, and they involve painful trade-offs:

GuaranteeWhat it meansRisk
At-most-once Fire and forget — the broker sends once and moves on Messages can be silently lost on failure
At-least-once Broker retries until the consumer acknowledges Consumer may receive the same message more than once
Exactly-once Delivered precisely one time, no duplicates Extremely hard; requires distributed transactions; only some systems (Kafka with idempotent producers + transactions) offer it, and at real throughput cost

In practice, at-least-once is the sweet spot for most systems: it's reliable without the complexity cost of exactly-once. The catch is that your consumer must handle duplicate messages safely. That's what idempotent consumers (covered in Lesson 02) solve: design the handler so that processing the same message twice produces the same result as processing it once.

# Pattern: idempotent SQS consumer in pseudo-Python

def handle_order_placed(message):
    order_id = message["order_id"]

    # Guard: skip if already processed
    if db.exists("processed_events", order_id):
        logger.info("duplicate, skipping", order_id=order_id)
        return                         # idempotent — safe to ack

    with db.transaction():
        update_inventory(order_id)
        db.insert("processed_events", order_id, ttl=7_days)

    message.ack()                       # tell broker: delivered successfully

Ordering and the difficulty of global order

A common assumption: "messages will arrive in the order I sent them." With most distributed queues, that's not guaranteed across partitions. Kafka guarantees order within a partition for the same key — so if you partition by user ID, all events for one user arrive in order to a single consumer. AWS SQS standard queues offer best-effort ordering; SQS FIFO queues guarantee order but at lower throughput.

Global ordering (all messages across all producers, globally in sequence) is nearly impossible at scale without sacrificing throughput — avoid designing systems that require it.

Backpressure

If a producer sends messages faster than consumers can process them, the queue grows without bound. Backpressure is a mechanism to signal this: the consumer or broker tells the producer "slow down." In Kafka this happens naturally — the producer can be configured to block or throw when the broker's internal buffer fills. In HTTP-based systems, a 429 Too Many Requests response is a form of backpressure. Without it, an unbounded queue is a time-delayed crash.

Real systems: Kafka, RabbitMQ, SNS/SQS

Scenario A social network needs to send push notifications, update follower timelines, and increment analytics counters every time a user posts. The post endpoint currently does all three synchronously and takes 3 seconds. Rewrite it using pub/sub.
# Before: synchronous, blocking
POST /posts
  → save post to DB
  → call notification service  # 800 ms
  → call timeline service      # 1200 ms
  → call analytics service     # 400 ms
  ← 200 after ~2.5 s

# After: async fan-out
POST /posts
  → save post to DB
  → publish "post.created" event to SNS topic  # <10 ms202 Accepted { "post_id": "p_xyz" }

# SNS fans to three SQS queues:
#   notifications-queue → push-notification worker
#   timelines-queue     → timeline-fanout worker
#   analytics-queue     → analytics worker
# Each drains independently; failures retry independently
🎯 Interview angle

Two common prompts: "when would you choose async over sync?" and "design a notification fan-out system." For the first: async wins when the caller doesn't need an immediate result, when the work is slow or flaky, or when multiple services need the same event. For the fan-out design: producer publishes to an SNS topic → SNS pushes to per-subscriber SQS queues → each subscriber drains its queue with idempotent consumers. Mention delivery guarantees (at-least-once), idempotency, and dead-letter queues for messages that fail repeatedly.

⚠️ Common trap

Two in one: assuming exactly-once delivery and relying on global message ordering. Neither is available cheaply at scale. Exactly-once requires coordination overhead that most systems cannot afford; build idempotent consumers instead. Global ordering requires a single sequencer — a scalability bottleneck. Partition by the right key (user ID, entity ID) to get local ordering within a consumer group.

✅ Do this, not that

Do include a unique event_id in every message payload and record processed IDs to achieve idempotent consumption under at-least-once delivery. Don't rely on the messaging system to guarantee exactly-once for you — most do not, and those that claim to do so require careful producer + consumer configuration that is easy to misconfigure.

Under the hood: how a broker actually delivers

The phrase "message is delivered to a consumer" glosses over a surprisingly concrete mechanism. Here is how Kafka — the most widely used event-streaming broker — actually works, followed by a worked trace. The same ideas apply (with different vocabulary) to RabbitMQ and SQS.

The append-only partition log

A Kafka topic is divided into one or more partitions. A partition is nothing more than an append-only, immutable file on disk — like a growing list. Producers append records to the end of a partition. Each record gets a monotonically increasing integer index called an offset. Offset 0 is the first record ever written; offset 1 is the second; and so on. Records are never overwritten or reordered — only appended.

This design is what gives Kafka its speed: sequential disk writes are far faster than random-access writes, and the log can be served to consumers via sendfile(2) (zero-copy) from the OS page cache. The broker is not a smart router — it is essentially a durable, indexed file server for a log.

Consumer groups and offset tracking

A consumer group is a named set of consumer instances that together consume a topic. Kafka assigns each partition to exactly one consumer in the group at a time — so work is distributed but never duplicated within a group. Multiple groups can independently consume the same topic, each with its own position in the log (this is the pub/sub fan-out: the inventory group and the email group each have their own offsets).

Each consumer group tracks its position with a committed offset: the offset of the last message the group has successfully processed. Kafka stores this in an internal topic (__consumer_offsets). On each poll cycle the consumer:

  1. Calls poll() — fetches a batch of records at offsets above the last committed offset.
  2. Processes each record in the batch (runs your handler code).
  3. Commits the offset of the last successfully processed record back to Kafka.

Step 3 is the ack. Only after commit does Kafka consider those records "consumed" by that group. If the consumer crashes between steps 2 and 3, Kafka will redeliver those records on the next poll — this is the mechanism behind at-least-once delivery.

Worked trace: offset, processing, commit, and redelivery

# Topic: orders | Partition 0 | Consumer group: email-service

# State of the partition log
Offset │ Record
───────┼───────────────────────────────────────
  0    │ {"order_id": "ord_10", "event": "order.placed"}
  1    │ {"order_id": "ord_11", "event": "order.placed"}
  2    │ {"order_id": "ord_12", "event": "order.placed"}
  3    │ {"order_id": "ord_13", "event": "order.placed"}
  ...

# Committed offset for email-service group = 1
# (records at offsets 0 and 1 have been acked)

# Consumer polls — fetches offsets 2 and 3
poll() → [record@2, record@3]

# Consumer processes record@2 (sends email for ord_12) ✓
# Consumer processes record@3 (sends email for ord_13) ✓
# Consumer crashes HERE — before committing

# On restart, committed offset is still 1.
# Consumer polls again — gets offsets 2 and 3 AGAIN (redelivery)
poll() → [record@2, record@3]   # redelivered!

# Without idempotency guard: ord_12 and ord_13 get duplicate emails.
# With idempotency guard (seen-events table):
if db.exists("processed_events", "ord_12"):
    skip()   # safe — no duplicate email
else:
    send_email("ord_12")
    db.insert("processed_events", "ord_12")

# After successful processing, commit offset 3:
# Committed offset for email-service = 3

This trace explains why exactly-once is hard: to guarantee the email is sent exactly once, you would need to atomically send the email and commit the offset in a single transaction — but those are two different systems (SMTP and Kafka). Kafka's transactional API allows atomic writes to Kafka itself (producer-to-Kafka exactly-once), but it cannot make the downstream side-effect (sending an email, charging a card) atomic. This is why the industry default is at-least-once + idempotent consumers, not exactly-once.

⚠️ The partition key determines both ordering and consumer assignment

Kafka routes records with the same partition key to the same partition, which is always assigned to the same consumer within a group. This is both a feature (order is preserved per key) and a trap. If you use a single hot key — say, the ID of a single very-active entity — all its records pile onto one partition and one consumer, even if you have twenty consumer instances. The other nineteen sit idle. Choose partition keys that distribute load evenly (user ID works; status="pending" does not).

Consumer lag: the operational heartbeat

The gap between the latest offset produced and the committed offset of a consumer group is called consumer lag. Lag = 0 means the consumer is caught up; lag growing means the consumer is falling behind the producer. Monitoring lag is the single most important operational signal for an async system — a growing lag is the earliest warning of a slow consumer, a resource bottleneck, or a silent processing failure.

How to debug & inspect it

Most async messaging bugs fall into one of four categories: the consumer is not running (lag grows), the consumer is crashing on a bad message (lag stalls at one offset), the consumer is processing but not committing (redelivery loop), or messages are being duplicated in the output (missing idempotency guard). Here is how to surface each one.

Inspect consumer lag (Kafka)

$ kafka-consumer-groups.sh --bootstrap-server kafka:9092 \ --group email-service --describe GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG email-service orders 0 1042 1042 0 ← caught up email-service orders 1 998 1150 152 ← falling behind email-service orders 2 500 800 300 ← stuck? # Partition 2 lag is large and unchanging — likely a consumer crash # or a poison-pill message the consumer keeps failing to process.

Inspect queue depth (SQS)

$ aws sqs get-queue-attributes \ --queue-url https://sqs.us-east-1.amazonaws.com/123/orders-queue \ --attribute-names ApproximateNumberOfMessages,ApproximateNumberOfMessagesNotVisible { "ApproximateNumberOfMessages": "1523", ← waiting to be picked up "ApproximateNumberOfMessagesNotVisible": "12" ← in-flight (being processed) } # High NotVisible with no drain = consumers crashing before acking # (messages become visible again after the visibility timeout)

Check DLQ depth (the alarm you always want)

$ aws sqs get-queue-attributes \ --queue-url https://sqs.us-east-1.amazonaws.com/123/orders-dlq \ --attribute-names ApproximateNumberOfMessages { "ApproximateNumberOfMessages": "47" ← non-zero DLQ depth = messages gave up } # Investigate: fetch a DLQ message body, look at the error, fix the # consumer, then replay from DLQ to the main queue.

Trace a redelivery

Include a delivery_attempt counter or use the broker's native attribute (SQS: ApproximateReceiveCount; Kafka: there is no built-in counter — add it to the record headers at publish time) to distinguish first delivery from retries. Log it on every processing attempt. A message with ApproximateReceiveCount > 1 that your handler processed without a dedup guard is a double-processing event.

SymptomLikely causeFix
Consumer lag grows monotonically and never decreasesConsumer is down, underscaled, or blocked on a slow downstream callCheck consumer pod/process health; add more consumer instances; add a timeout to the downstream call
Lag stalls at one specific offset for minutesPoison-pill message: the consumer crashes or throws on every attemptInspect the message at that offset; fix the consumer or route the message to a DLQ via a max-retry policy
Messages appear in DLQ unexpectedlyConsumer throws an unhandled exception or exceeds the visibility/processing timeoutLog the exception; extend timeout if slow; fix the handler bug
Duplicate side-effects (double emails, double charges)At-least-once redelivery without an idempotency guard in the consumerAdd a seen-events dedup table keyed on the message/event ID; check before processing
Out-of-order processing for the same entityMultiple partitions / consumers processing different events for the same entity keyEnsure the partition key is the entity ID so all events for one entity go to one partition/consumer
Consumer group shows 0 lag but work isn't happeningAuto-commit is advancing the offset even when processing failsSwitch to manual offset commit (commit only after successful processing)

Debug checklist:

  1. Check consumer lag / queue depth — is there a backlog? Is it growing or stable?
  2. Check DLQ depth — any non-zero count demands investigation.
  3. Look at consumer logs for exceptions or timeouts on processing.
  4. Inspect a sample message in the DLQ to see the actual payload that failed.
  5. Verify offset commit strategy — is the consumer committing manually after success, or relying on auto-commit?
  6. Check partition key distribution — are all records landing on one hot partition?
  7. If duplicate output is observed, verify the idempotency guard is in place and the dedup store is being checked before (not after) processing.
▶ See it live

Play with the queue & backpressure simulator — drag the load toward 100M req/s and watch this behaviour in real time.

🧠 Quick check

1. You publish an "invoice.generated" event and need the accounting service, the tax service, and the email service each to process it independently. Which pattern fits?

Pub/sub topics fan a single message out to all subscribers simultaneously. Each service receives an independent copy without any service needing to know about the others — that's the decoupling value of the pattern.

2. Your queue uses at-least-once delivery. The email consumer crashes after sending an email but before acknowledging the message. What happens?

With at-least-once delivery, an unacknowledged message is redelivered. The consumer must be idempotent — in this case, it should check whether the email was already sent before sending again.

3. Why is global ordering across all partitions nearly impossible at scale?

Global ordering means all messages must pass through one sequencer to be numbered. That single point cannot be partitioned, so it limits throughput to a single machine — the opposite of horizontal scale.

4. A consumer cannot keep up with the message rate and the queue grows unboundedly. What is the correct mechanism to address this before the system crashes?

Backpressure propagates the signal "you're producing faster than I can consume" back to the producer. The real fix is to slow the producer or scale the consumers — not to drop messages or extend TTL.

✍️ Exercise: design a notification fan-out system for 10 million users

A social platform sends notifications whenever a celebrity (with up to 10 million followers) posts. Design the messaging architecture from "celebrity posts" to "followers receive a push notification." Address: throughput, delivery guarantee, idempotency, and what happens if the push-notification provider is temporarily down.

Model answer:

  1. Publish event. When the celebrity posts, the posts service publishes a single post.created event to a Kafka topic (or SNS), keyed by celebrity user ID. The HTTP handler returns 202 immediately.
  2. Fan-out worker. A fan-out service reads the event and looks up the 10M follower list in batches (e.g. 1,000 at a time). For each batch it publishes individual notify_user:{user_id} messages to an SQS queue (or a Kafka topic partitioned by user ID). This spreading is the slow step and happens asynchronously.
  3. Notification workers. A pool of workers drains the notification queue. Each worker calls the push provider (APNs/FCM) and records the notification_id + user_id + post_id in a deduplication store (Redis with 24 h TTL) to make processing idempotent.
  4. Provider outage. SQS retries with exponential back-off. After N retries, messages land in a Dead Letter Queue (DLQ). An alarm on DLQ depth pages on-call; once the provider recovers, the DLQ is replayed.
  5. Delivery guarantee. At-least-once throughout (SQS default). Idempotency guards prevent duplicate pushes. Exactly-once is not required — receiving a duplicate push once in a while is acceptable.

Rubric: ✓ async publish (202 Accepted) ✓ fan-out decoupled from the post handler ✓ idempotent consumer with dedup store ✓ DLQ for push provider outage ✓ at-least-once + idempotency instead of exactly-once ✓ partitioning for ordering within a user's events. Five or more = strong answer.

Key takeaways

Sources & further reading