Design Case Studies · Lesson 06

Design: Notification / Fan-out Service

A notification service looks like a simple "send message to user" endpoint — until it needs to reach millions of users across push, email, SMS, and in-app channels simultaneously, honour per-user preferences, and never send the same message twice. This is an original design created for this course.

⏱ 20 min Difficulty: advanced Prereq: pub/sub, idempotency, rate limiting

By the end you'll be able to

Explain why fan-out to millions must be async and sketch the queue-based architecture.
Design idempotent send using a dedup key and describe how to honour per-user channel preferences.
Identify where per-user rate limiting, retries, and dead-letter queues fit in the delivery pipeline.

1 · Requirements

The brief: "Build a notification service that sends messages to users across push, email, SMS, and in-app channels, respects their preferences, and scales to millions of recipients without duplicates."

Requirement	What it implies
Multi-channel delivery	Push (APNs/FCM), email (SMTP provider), SMS (Twilio/SNS), in-app (WebSocket/DB)
Per-user preferences	User can opt out of SMS; can set quiet hours; can choose channel priority
Templates	Notifications are parameterised ("Hi {name}, your order {id} shipped") not raw strings
Deduplication	Caller-supplied dedup key prevents sending the same notification twice on retry
Fan-out at scale	A "send to segment" operation may target 10M users; must be async
Per-user rate limiting	No user should receive more than N notifications per hour across all channels
Retries + DLQ	Provider failures must be retried; permanently failed deliveries land in DLQ

2 · Design decisions

2a. Async decoupling via queue/pub-sub

Imagine a concert venue that needs to mail tickets to ten million fans simultaneously. The box office doesn't hand-deliver — it drops all the envelopes in a postal system and the postal workers handle the actual delivery in parallel. If one post office is down, the envelopes wait; they don't vanish.

The notification service works the same way. The caller (the API) is the box office: it accepts a notification request, validates it, and immediately enqueues it. The actual multi-channel send happens asynchronously via channel-specific workers. This separation means:

The API responds in < 50 ms regardless of how many recipients there are.
A channel provider going down (e.g. Twilio outage) doesn't affect other channels or the API response time.
Fan-out can be parallelised across thousands of workers.

See Event-driven & Pub/Sub (rel-10) for the pub-sub architecture underpinning this queue approach.

2b. Per-channel providers

Each channel has a different external API and failure mode. The architecture uses a provider abstraction layer: a single internal send interface, with pluggable backends per channel. This means swapping from SendGrid to Mailgun for email requires no changes to the notification API surface.

Channel	Provider examples	Failure characteristics
Push (mobile)	APNs, Firebase FCM	Token expiry (silent fail), rate limits per app
Email	SendGrid, Mailgun, SES	Bounce, spam filter; async delivery receipts
SMS	Twilio, AWS SNS	Cost-per-message; carrier filtering; latency spikes
In-app	Internal WebSocket / DB	User offline (message persisted to DB for later)

2c. Idempotent send via dedup key

If the queue worker crashes after sending the email but before acknowledging the message, the broker re-delivers it and the worker sends the email again. Without deduplication, the user receives the same email twice.

The solution: the caller provides a dedup_key (a stable, unique string for this logical send intent — e.g. order_shipped:ord_123). Before dispatching to the provider, the worker checks a deduplication store keyed on dedup_key + channel. If already sent, skip and ack. See Idempotency (rel-02) for implementation patterns.

2d. Per-user rate limiting

Without rate limits, a runaway service could send a user 500 emails in an hour. Apply a sliding-window rate limit per user per channel. When the limit is exceeded, the delivery is either deferred to the next window (for non-urgent notifications) or dropped with a DLQ entry (for time-sensitive ones that would be stale if deferred). See Rate Limiting (rel-03).

2e. Retries, DLQ, and preference enforcement

The delivery pipeline enforces preferences at dispatch time (not at receipt), for two reasons: (1) preferences can change between enqueue and dispatch — checking at dispatch time catches changes; (2) checking at enqueue time would require loading preferences for millions of users upfront during fan-out.

Pipeline: the API enqueues immediately. Fan-out workers expand segments into per-user delivery jobs and enforce preferences. Channel workers dispatch via provider and handle retries.

3 · The API model

# 1. Send a notification (single user or segment)
POST /v1/notifications
Authorization: Bearer <api_key>
Content-Type: application/json

{
  "target": {
    "type": "user",       # "user" | "segment" | "device_token"
    "id": "usr_9kLm"
  },
  "template_id": "order_shipped_v2",
  "template_data": {
    "name": "Ada",
    "order_id": "ord_123",
    "tracking_url": "https://track.example/T9X2"
  },
  "channels": ["push", "email"],   # requested channels; preferences filter further
  "dedup_key": "order_shipped:ord_123"
}

# Response 202 — accepted for async delivery
{
  "notification_id": "notif_4QrZ",
  "status": "accepted",
  "channels_requested": ["push", "email"],
  "estimated_recipients": 1
}

# 2. Send to a segment (fan-out)
POST /v1/notifications
{
  "target": {
    "type": "segment",
    "id": "seg_pro_users"    # pre-defined user segment
  },
  "template_id": "feature_announcement_v1",
  "template_data": { "feature_name": "Dark Mode" },
  "channels": ["push", "in_app"],
  "dedup_key": "feature_ann:dark_mode:2026-06-20"
}

# Response 202
{
  "notification_id": "notif_7TqW",
  "status": "accepted",
  "estimated_recipients": 1400000
}

# 3. Get notification status
GET /v1/notifications/notif_7TqW
Authorization: Bearer <api_key>

# Response 200
{
  "id": "notif_7TqW",
  "status": "in_progress",
  "channels": {
    "push":   { "sent": 820000, "failed": 1204, "pending": 578796 },
    "in_app": { "sent": 900000, "failed": 0,    "pending": 500000 }
  },
  "created_at": "2026-06-20T11:00:00Z"
}

# 4. Get user notification preferences
GET /v1/users/usr_9kLm/notification-preferences
Authorization: Bearer <api_key>

# Response 200
{
  "user_id": "usr_9kLm",
  "channels": {
    "push":   { "enabled": true },
    "email":  { "enabled": true },
    "sms":    { "enabled": false, "opted_out_at": "2026-01-15T00:00:00Z" },
    "in_app": { "enabled": true }
  },
  "quiet_hours": { "from": "22:00", "to": "08:00", "timezone": "America/New_York" }
}

# 5. Update preferences
PUT /v1/users/usr_9kLm/notification-preferences
Authorization: Bearer <user_token>    # user-scoped token, not service token
Content-Type: application/json

{
  "channels": {
    "sms": { "enabled": true }     # re-subscribe to SMS
  }
}
# Response 200 — returns full updated preferences object

Each channel worker checks preferences (skip if opted out or in quiet hours), then checks the dedup store (skip if dedup_key already processed). Only new, opted-in sends reach the provider.

4 · Evaluation & latency budget

Async decoupling wins

The API responds in <50 ms for a 1.4M-user fan-out because it only validates the request and enqueues one job. The fan-out work is done by workers running in parallel, horizontally scaled. Critically, a provider outage (Twilio is down) only affects the SMS worker queue; push and email workers continue normally. Without async decoupling, a synchronous fan-out would block the calling service for minutes and cascade failures.

Idempotency guarantee

The dedup store (Redis or a unique constraint in a relational DB) records (dedup_key, channel, notification_id) when a send succeeds. On re-delivery, the worker checks this record first. This is the standard idempotent-consumer pattern from Idempotency (rel-02) applied to the notification domain.

Fan-out throughput

Stage	Throughput target	Scaling
API accept	< 50 ms p99 response	Stateless; scale horizontally
Fan-out worker	~100k users/sec per worker	Horizontally scale workers
Push channel worker	~50k sends/sec per worker	Scale to provider rate limits
Email channel worker	~10k sends/sec per worker	Provider limits dominate
Total time for 1M push	~20 seconds	With 10 workers at 50k/s each

Preference check overhead

Loading preferences at dispatch time adds one cache read per user per channel job. Cache user preferences in Redis with a TTL of ~5 minutes. Cache miss rate at steady state is low for active users; the preference store handles the cold-start burst on worker scale-out.

🎯 Interview angle

A common follow-up: "what if we need to send 10 million push notifications in under 60 seconds?" The answer is horizontal scaling of workers — but the real constraint is provider rate limits (APNs caps you at a certain tokens-per-second). The right answer: pre-negotiate higher rate limits with the provider, use multiple provider accounts/regions, and accept that 10M in <60s requires significant provider-side capacity planning — not just more workers.

⚠️ Common trap

Enforcing preferences at enqueue time, not at dispatch time. It feels efficient — filter early, enqueue fewer jobs — but it creates a race condition: the user opts out after the job is enqueued but before it's sent. The notification is sent despite the user's explicit opt-out. Always enforce preferences at the last moment before delivery — the channel worker — not at enqueue time.

✅ Make dedup_key the caller's responsibility

Don't generate dedup keys server-side. Require callers to supply a dedup_key that is stable across retries. A good key is a hash of the logical event: order_shipped:ord_123. If callers omit it, generate a UUID — but then re-send on retry will produce a duplicate. Document this clearly so callers don't inadvertently omit it on idempotent triggers.

✍️ Exercise: design quiet-hours deferral

A user has quiet hours 22:00–08:00 in America/New_York. A notification arrives at 23:30 New York time. Design the full behaviour: what does the worker do, what state must be stored, and how does the deferred message eventually send?

Model answer:

Worker loads preferences: quiet hours active. Mark the delivery job as deferred; store deliver_after = 2026-06-21T08:00:00-04:00.
Re-queue the job with a visibility delay: the message is invisible to workers until deliver_after. (SQS delay queues or a delayed job scheduler achieves this.)
At 08:00, worker picks up the job, re-checks preferences (user may have updated them overnight), and sends if still opted in.
Dedup check prevents a duplicate if the same notification was also attempted via another path.

# Deferred job record
{
  "notification_id": "notif_4QrZ",
  "user_id": "usr_9kLm",
  "channel": "push",
  "status": "deferred_quiet_hours",
  "deliver_after": "2026-06-21T08:00:00-04:00",
  "dedup_key": "order_shipped:ord_123"
}

Rubric: ✓ deferred status stored ✓ deliver_after in user's timezone ✓ re-check preferences at delivery, not just at deferral ✓ dedup key preserved to prevent duplicate on eventual send.

Under the hood: the core mechanism

The notification service is a multi-stage pipeline. Understanding each stage's function and data contract is what separates a design that "sounds right" in an interview from one that would actually survive a production incident.

The five-stage pipeline

Every notification — whether to one user or ten million — moves through the same five stages. The stages are explicitly decoupled by queues so that each one can fail and recover independently.

Stage ① accepts in <50 ms. Stages ③–⑤ are workers that scale independently. A failure in the SMS channel queue has no effect on push or email queues.

Worked trace: one notification fanning to 3 channels

Trace notif_4QrZ — a single-user "order shipped" notification requested on push + email + SMS — through every check and decision in the pipeline:

# Stage 1 — API receives POST /v1/notifications
dedup_key = "order_shipped:ord_123"
check dedup store: SELECT FROM notif_dedup WHERE key = "order_shipped:ord_123"
→ NOT FOUND: proceed  (if FOUND → return 202 with original notif_id, skip enqueue)
write notif record: INSERT INTO notifications (id="notif_4QrZ", status="accepted", …)
enqueue to main queue: { notif_id:"notif_4QrZ", user:"usr_9kLm",
                           channels:["push","email","sms"],
                           template:"order_shipped_v2",
                           data:{name:"Ada", order_id:"ord_123"} }
→ 202 { notification_id:"notif_4QrZ", status:"accepted" }  ← returned in <50 ms

# Stage 2 — Fan-out worker picks up notif_4QrZ
load user preferences: GET from Redis cache (key "prefs:usr_9kLm")
→ { push:true, email:true, sms:false }   ← user opted out of SMS
SMS pruned: enqueue push job + email job; SMS job NOT enqueued
rate limit check: sliding window "ratelimit:usr_9kLm" — 2 notifs this hour, limit 10 → OK

# Stage 3 — Push channel worker picks up push job
dedup check: SELECT FROM notif_dedup WHERE key="order_shipped:ord_123" AND channel="push"
→ NOT FOUND → proceed
render template: "Hi Ada, your order ord_123 has shipped!"
call FCM API: POST https://fcm.googleapis.com/v1/projects/…/messages:send
→ HTTP 200 { message_id: "projects/.../messages/fcm_abc" }
mark dedup: INSERT INTO notif_dedup (key="order_shipped:ord_123", channel="push", sent_at=now())
ack job: message removed from push queue

# Stage 4 — Email channel worker picks up email job
dedup check: channel="email" → NOT FOUND → proceed
call SendGrid API: POST https://api.sendgrid.com/v3/mail/send
→ HTTP 429 Too Many Requests   ← rate limit hit
retry scheduled: attempt 1 failed; retry in 2 s (exponential back-off)
→ retry attempt 2: HTTP 200 { message_id: "sg_xyz" }
mark dedup: channel="email", sent_at=now()
ack job

# SMS: never enqueued (preference=false) → no dedup entry, no provider call

# Final state
notifications.channels = {
  push:  { status:"sent",    provider_id:"fcm_abc", sent_at:"2026-06-20T11:00:02Z" },
  email: { status:"sent",    provider_id:"sg_xyz",  sent_at:"2026-06-20T11:00:05Z" },
  sms:   { status:"skipped", reason:"user_opted_out" }
}

Dedup key scoping

The dedup check happens at two levels. At the notification level (Stage 1), the check is on dedup_key alone — this prevents re-enqueuing the same logical event if the caller retries the API call. At the channel level (Stage 3/4), the check is on (dedup_key, channel) — this prevents the channel worker from re-sending after a crash-between-send-and-ack scenario. Both checks are necessary; neither alone is sufficient.

Operating & debugging it

Notification pipelines fail silently in characteristic ways: a channel worker crashes after sending but before acking (duplicates), the dedup store is too short-lived (duplicates after TTL expiry), or preferences are stale (sends to opted-out users). All three are observable from worker logs and the notification status endpoint.

Inspect a notification's delivery state

$ curl -s https://api.example.com/v1/notifications/notif_4QrZ \ -H "Authorization: Bearer $TOKEN" | jq '.channels' { "push": { "status": "sent", "provider_id": "fcm_abc", "attempts": 1 }, "email": { "status": "sent", "provider_id": "sg_xyz", "attempts": 2 }, "sms": { "status": "skipped", "reason": "user_opted_out" } } $ curl -s https://api.example.com/v1/notifications/notif_7TqW \ -H "Authorization: Bearer $TOKEN" | jq '.channels.push' { "status": "in_progress", "sent": 820000, "failed": 1204, "pending": 578796 } # 1204 failures on push — check the DLQ $ curl -s "https://api.example.com/v1/notifications/notif_7TqW/dlq?channel=push&limit=5" \ -H "Authorization: Bearer $TOKEN" | jq '.[0]' { "user_id": "usr_bad_token", "channel": "push", "error": "FCM: InvalidRegistration — device token no longer valid", "attempts": 5 } # InvalidRegistration = stale device token; remove it from your token store

Symptom → cause → fix

Symptom	Likely cause	Fix
User reports receiving the same notification twice	Channel worker sent successfully but crashed before acking; broker re-delivered the job	Confirm dedup check is on `(dedup_key, channel)` with an atomic insert; ensure the dedup record TTL is longer than the token's validity window
User received notification despite opting out	Preferences were checked at enqueue time, not at dispatch time; the opt-out happened between enqueue and dispatch	Always load preferences inside the channel worker, not in the fan-out worker that enqueues; add a preference version field to detect stale reads
Push channel DLQ growing for specific users	Stale FCM/APNs device tokens — the app was uninstalled but the token was never removed	On FCM `InvalidRegistration` / APNs `410 Unregistered`, delete the token from your store immediately; do not retry
Email delivery rate drops; SendGrid returning 429	Worker concurrency is too high; sending faster than the provider rate limit allows	Add a token-bucket rate limiter per provider account inside the email worker; auto-scale workers up to the negotiated rate, not beyond
API response time spikes on large fan-out requests	Fan-out worker is running synchronously or the main queue is blocking the API path	API must only validate + enqueue one job; never expand the segment in the API handler; use a proper async queue with O(1) enqueue
Notification status endpoint shows "accepted" forever	Fan-out worker is not running (crashed) or the main queue is not draining	Alert on main-queue depth; add a heartbeat check that verifies fan-out worker is consuming; page on lag > N minutes
Duplicate dedup_key accepted on retry — second API call creates a new notif_id	Notification-level dedup store entry has already expired (short TTL) or was never written (API crash before INSERT)	Write the dedup entry and the notification record in the same DB transaction; set dedup TTL to at least the retention window of the notification

Start with GET /v1/notifications/{id} — read each channel's status, attempts, and any error code.
For failures, check the channel DLQ: provider error codes (FCM, APNs, Twilio) directly name the failure cause.
For suspected duplicates, query the dedup store: SELECT * FROM notif_dedup WHERE key = '...' AND channel = '...' — if two rows exist, a race occurred.
For preference violations, compare sent_at on the notification with opted_out_at on the user preference — if opt-out is earlier, preferences were checked at the wrong stage.
Monitor queue depth and worker lag as primary health metrics; alert before the user-visible SLA is breached.

🧠 Quick check

1. Sending notifications through an async queue instead of synchronously inside the API request mainly buys you:

Fan-out to email/SMS/push providers is slow and failure-prone. A queue lets the API return immediately, absorbs spikes, and retries failed sends without blocking the caller.

2. To avoid sending the same push twice when a send is retried after a timeout, you:

A timeout doesn't tell you whether the send happened. Keying each notification by a dedup id lets the worker recognise and skip a duplicate — the same idempotency pattern as payments.

3. Before dispatching, the notification service must check:

Respecting preferences/opt-outs is both a product and compliance requirement — sending to a channel a user disabled (or after they unsubscribed) is a real failure, independent of system health.

Key takeaways

Async queue decoupling is non-negotiable for fan-out at scale — the API must respond in <50 ms regardless of recipient count; workers scale independently per channel.
Per-channel provider abstraction insulates the API from provider changes and lets channels fail independently.
Idempotent send via caller-supplied dedup key is the correct pattern — never rely on server-generated IDs for dedup because retry generates a new ID.
Enforce preferences at dispatch time, not enqueue time — preference changes after enqueue would otherwise be ignored.
Per-user rate limits + DLQ complete the pipeline: no user is spammed, and undeliverable messages are preserved for inspection rather than silently dropped.