Design Case Studies · Lesson 06
Design: Notification / Fan-out Service
A notification service looks like a simple "send message to user" endpoint — until it needs to reach millions of users across push, email, SMS, and in-app channels simultaneously, honour per-user preferences, and never send the same message twice. This is an original design created for this course.
By the end you'll be able to
- Explain why fan-out to millions must be async and sketch the queue-based architecture.
- Design idempotent send using a dedup key and describe how to honour per-user channel preferences.
- Identify where per-user rate limiting, retries, and dead-letter queues fit in the delivery pipeline.
1 · Requirements
The brief: "Build a notification service that sends messages to users across push, email, SMS, and in-app channels, respects their preferences, and scales to millions of recipients without duplicates."
| Requirement | What it implies |
|---|---|
| Multi-channel delivery | Push (APNs/FCM), email (SMTP provider), SMS (Twilio/SNS), in-app (WebSocket/DB) |
| Per-user preferences | User can opt out of SMS; can set quiet hours; can choose channel priority |
| Templates | Notifications are parameterised ("Hi {name}, your order {id} shipped") not raw strings |
| Deduplication | Caller-supplied dedup key prevents sending the same notification twice on retry |
| Fan-out at scale | A "send to segment" operation may target 10M users; must be async |
| Per-user rate limiting | No user should receive more than N notifications per hour across all channels |
| Retries + DLQ | Provider failures must be retried; permanently failed deliveries land in DLQ |
2 · Design decisions
2a. Async decoupling via queue/pub-sub
Imagine a concert venue that needs to mail tickets to ten million fans simultaneously. The box office doesn't hand-deliver — it drops all the envelopes in a postal system and the postal workers handle the actual delivery in parallel. If one post office is down, the envelopes wait; they don't vanish.
The notification service works the same way. The caller (the API) is the box office: it accepts a notification request, validates it, and immediately enqueues it. The actual multi-channel send happens asynchronously via channel-specific workers. This separation means:
- The API responds in < 50 ms regardless of how many recipients there are.
- A channel provider going down (e.g. Twilio outage) doesn't affect other channels or the API response time.
- Fan-out can be parallelised across thousands of workers.
See Event-driven & Pub/Sub (rel-10) for the pub-sub architecture underpinning this queue approach.
2b. Per-channel providers
Each channel has a different external API and failure mode. The architecture uses a provider abstraction layer: a single internal send interface, with pluggable backends per channel. This means swapping from SendGrid to Mailgun for email requires no changes to the notification API surface.
| Channel | Provider examples | Failure characteristics |
|---|---|---|
| Push (mobile) | APNs, Firebase FCM | Token expiry (silent fail), rate limits per app |
| SendGrid, Mailgun, SES | Bounce, spam filter; async delivery receipts | |
| SMS | Twilio, AWS SNS | Cost-per-message; carrier filtering; latency spikes |
| In-app | Internal WebSocket / DB | User offline (message persisted to DB for later) |
2c. Idempotent send via dedup key
If the queue worker crashes after sending the email but before acknowledging the message, the broker re-delivers it and the worker sends the email again. Without deduplication, the user receives the same email twice.
The solution: the caller provides a dedup_key (a stable, unique string for this logical send intent — e.g. order_shipped:ord_123). Before dispatching to the provider, the worker checks a deduplication store keyed on dedup_key + channel. If already sent, skip and ack. See Idempotency (rel-02) for implementation patterns.
2d. Per-user rate limiting
Without rate limits, a runaway service could send a user 500 emails in an hour. Apply a sliding-window rate limit per user per channel. When the limit is exceeded, the delivery is either deferred to the next window (for non-urgent notifications) or dropped with a DLQ entry (for time-sensitive ones that would be stale if deferred). See Rate Limiting (rel-03).
2e. Retries, DLQ, and preference enforcement
The delivery pipeline enforces preferences at dispatch time (not at receipt), for two reasons: (1) preferences can change between enqueue and dispatch — checking at dispatch time catches changes; (2) checking at enqueue time would require loading preferences for millions of users upfront during fan-out.
3 · The API model
# 1. Send a notification (single user or segment)
POST /v1/notifications
Authorization: Bearer <api_key>
Content-Type: application/json
{
"target": {
"type": "user", # "user" | "segment" | "device_token"
"id": "usr_9kLm"
},
"template_id": "order_shipped_v2",
"template_data": {
"name": "Ada",
"order_id": "ord_123",
"tracking_url": "https://track.example/T9X2"
},
"channels": ["push", "email"], # requested channels; preferences filter further
"dedup_key": "order_shipped:ord_123"
}
# Response 202 — accepted for async delivery
{
"notification_id": "notif_4QrZ",
"status": "accepted",
"channels_requested": ["push", "email"],
"estimated_recipients": 1
}
# 2. Send to a segment (fan-out)
POST /v1/notifications
{
"target": {
"type": "segment",
"id": "seg_pro_users" # pre-defined user segment
},
"template_id": "feature_announcement_v1",
"template_data": { "feature_name": "Dark Mode" },
"channels": ["push", "in_app"],
"dedup_key": "feature_ann:dark_mode:2026-06-20"
}
# Response 202
{
"notification_id": "notif_7TqW",
"status": "accepted",
"estimated_recipients": 1400000
}
# 3. Get notification status
GET /v1/notifications/notif_7TqW
Authorization: Bearer <api_key>
# Response 200
{
"id": "notif_7TqW",
"status": "in_progress",
"channels": {
"push": { "sent": 820000, "failed": 1204, "pending": 578796 },
"in_app": { "sent": 900000, "failed": 0, "pending": 500000 }
},
"created_at": "2026-06-20T11:00:00Z"
}
# 4. Get user notification preferences
GET /v1/users/usr_9kLm/notification-preferences
Authorization: Bearer <api_key>
# Response 200
{
"user_id": "usr_9kLm",
"channels": {
"push": { "enabled": true },
"email": { "enabled": true },
"sms": { "enabled": false, "opted_out_at": "2026-01-15T00:00:00Z" },
"in_app": { "enabled": true }
},
"quiet_hours": { "from": "22:00", "to": "08:00", "timezone": "America/New_York" }
}
# 5. Update preferences
PUT /v1/users/usr_9kLm/notification-preferences
Authorization: Bearer <user_token> # user-scoped token, not service token
Content-Type: application/json
{
"channels": {
"sms": { "enabled": true } # re-subscribe to SMS
}
}
# Response 200 — returns full updated preferences object
4 · Evaluation & latency budget
Async decoupling wins
The API responds in <50 ms for a 1.4M-user fan-out because it only validates the request and enqueues one job. The fan-out work is done by workers running in parallel, horizontally scaled. Critically, a provider outage (Twilio is down) only affects the SMS worker queue; push and email workers continue normally. Without async decoupling, a synchronous fan-out would block the calling service for minutes and cascade failures.
Idempotency guarantee
The dedup store (Redis or a unique constraint in a relational DB) records (dedup_key, channel, notification_id) when a send succeeds. On re-delivery, the worker checks this record first. This is the standard idempotent-consumer pattern from Idempotency (rel-02) applied to the notification domain.
Fan-out throughput
| Stage | Throughput target | Scaling |
|---|---|---|
| API accept | < 50 ms p99 response | Stateless; scale horizontally |
| Fan-out worker | ~100k users/sec per worker | Horizontally scale workers |
| Push channel worker | ~50k sends/sec per worker | Scale to provider rate limits |
| Email channel worker | ~10k sends/sec per worker | Provider limits dominate |
| Total time for 1M push | ~20 seconds | With 10 workers at 50k/s each |
Preference check overhead
Loading preferences at dispatch time adds one cache read per user per channel job. Cache user preferences in Redis with a TTL of ~5 minutes. Cache miss rate at steady state is low for active users; the preference store handles the cold-start burst on worker scale-out.
A common follow-up: "what if we need to send 10 million push notifications in under 60 seconds?" The answer is horizontal scaling of workers — but the real constraint is provider rate limits (APNs caps you at a certain tokens-per-second). The right answer: pre-negotiate higher rate limits with the provider, use multiple provider accounts/regions, and accept that 10M in <60s requires significant provider-side capacity planning — not just more workers.
Enforcing preferences at enqueue time, not at dispatch time. It feels efficient — filter early, enqueue fewer jobs — but it creates a race condition: the user opts out after the job is enqueued but before it's sent. The notification is sent despite the user's explicit opt-out. Always enforce preferences at the last moment before delivery — the channel worker — not at enqueue time.
Don't generate dedup keys server-side. Require callers to supply a dedup_key that is stable across retries. A good key is a hash of the logical event: order_shipped:ord_123. If callers omit it, generate a UUID — but then re-send on retry will produce a duplicate. Document this clearly so callers don't inadvertently omit it on idempotent triggers.
✍️ Exercise: design quiet-hours deferral
A user has quiet hours 22:00–08:00 in America/New_York. A notification arrives at 23:30 New York time. Design the full behaviour: what does the worker do, what state must be stored, and how does the deferred message eventually send?
Model answer:
- Worker loads preferences: quiet hours active. Mark the delivery job as
deferred; storedeliver_after = 2026-06-21T08:00:00-04:00. - Re-queue the job with a visibility delay: the message is invisible to workers until
deliver_after. (SQS delay queues or a delayed job scheduler achieves this.) - At 08:00, worker picks up the job, re-checks preferences (user may have updated them overnight), and sends if still opted in.
- Dedup check prevents a duplicate if the same notification was also attempted via another path.
# Deferred job record
{
"notification_id": "notif_4QrZ",
"user_id": "usr_9kLm",
"channel": "push",
"status": "deferred_quiet_hours",
"deliver_after": "2026-06-21T08:00:00-04:00",
"dedup_key": "order_shipped:ord_123"
}
Rubric: ✓ deferred status stored ✓ deliver_after in user's timezone ✓ re-check preferences at delivery, not just at deferral ✓ dedup key preserved to prevent duplicate on eventual send.
Under the hood: the core mechanism
The notification service is a multi-stage pipeline. Understanding each stage's function and data contract is what separates a design that "sounds right" in an interview from one that would actually survive a production incident.
The five-stage pipeline
Every notification — whether to one user or ten million — moves through the same five stages. The stages are explicitly decoupled by queues so that each one can fail and recover independently.
Worked trace: one notification fanning to 3 channels
Trace notif_4QrZ — a single-user "order shipped" notification requested on push + email + SMS — through every check and decision in the pipeline:
# Stage 1 — API receives POST /v1/notifications
dedup_key = "order_shipped:ord_123"
check dedup store: SELECT FROM notif_dedup WHERE key = "order_shipped:ord_123"
→ NOT FOUND: proceed (if FOUND → return 202 with original notif_id, skip enqueue)
write notif record: INSERT INTO notifications (id="notif_4QrZ", status="accepted", …)
enqueue to main queue: { notif_id:"notif_4QrZ", user:"usr_9kLm",
channels:["push","email","sms"],
template:"order_shipped_v2",
data:{name:"Ada", order_id:"ord_123"} }
→ 202 { notification_id:"notif_4QrZ", status:"accepted" } ← returned in <50 ms
# Stage 2 — Fan-out worker picks up notif_4QrZ
load user preferences: GET from Redis cache (key "prefs:usr_9kLm")
→ { push:true, email:true, sms:false } ← user opted out of SMS
SMS pruned: enqueue push job + email job; SMS job NOT enqueued
rate limit check: sliding window "ratelimit:usr_9kLm" — 2 notifs this hour, limit 10 → OK
# Stage 3 — Push channel worker picks up push job
dedup check: SELECT FROM notif_dedup WHERE key="order_shipped:ord_123" AND channel="push"
→ NOT FOUND → proceed
render template: "Hi Ada, your order ord_123 has shipped!"
call FCM API: POST https://fcm.googleapis.com/v1/projects/…/messages:send
→ HTTP 200 { message_id: "projects/.../messages/fcm_abc" }
mark dedup: INSERT INTO notif_dedup (key="order_shipped:ord_123", channel="push", sent_at=now())
ack job: message removed from push queue
# Stage 4 — Email channel worker picks up email job
dedup check: channel="email" → NOT FOUND → proceed
call SendGrid API: POST https://api.sendgrid.com/v3/mail/send
→ HTTP 429 Too Many Requests ← rate limit hit
retry scheduled: attempt 1 failed; retry in 2 s (exponential back-off)
→ retry attempt 2: HTTP 200 { message_id: "sg_xyz" }
mark dedup: channel="email", sent_at=now()
ack job
# SMS: never enqueued (preference=false) → no dedup entry, no provider call
# Final state
notifications.channels = {
push: { status:"sent", provider_id:"fcm_abc", sent_at:"2026-06-20T11:00:02Z" },
email: { status:"sent", provider_id:"sg_xyz", sent_at:"2026-06-20T11:00:05Z" },
sms: { status:"skipped", reason:"user_opted_out" }
}
Dedup key scoping
The dedup check happens at two levels. At the notification level (Stage 1), the check is on dedup_key alone — this prevents re-enqueuing the same logical event if the caller retries the API call. At the channel level (Stage 3/4), the check is on (dedup_key, channel) — this prevents the channel worker from re-sending after a crash-between-send-and-ack scenario. Both checks are necessary; neither alone is sufficient.
Operating & debugging it
Notification pipelines fail silently in characteristic ways: a channel worker crashes after sending but before acking (duplicates), the dedup store is too short-lived (duplicates after TTL expiry), or preferences are stale (sends to opted-out users). All three are observable from worker logs and the notification status endpoint.
Inspect a notification's delivery state
Symptom → cause → fix
| Symptom | Likely cause | Fix |
|---|---|---|
| User reports receiving the same notification twice | Channel worker sent successfully but crashed before acking; broker re-delivered the job | Confirm dedup check is on (dedup_key, channel) with an atomic insert; ensure the dedup record TTL is longer than the token's validity window |
| User received notification despite opting out | Preferences were checked at enqueue time, not at dispatch time; the opt-out happened between enqueue and dispatch | Always load preferences inside the channel worker, not in the fan-out worker that enqueues; add a preference version field to detect stale reads |
| Push channel DLQ growing for specific users | Stale FCM/APNs device tokens — the app was uninstalled but the token was never removed | On FCM InvalidRegistration / APNs 410 Unregistered, delete the token from your store immediately; do not retry |
| Email delivery rate drops; SendGrid returning 429 | Worker concurrency is too high; sending faster than the provider rate limit allows | Add a token-bucket rate limiter per provider account inside the email worker; auto-scale workers up to the negotiated rate, not beyond |
| API response time spikes on large fan-out requests | Fan-out worker is running synchronously or the main queue is blocking the API path | API must only validate + enqueue one job; never expand the segment in the API handler; use a proper async queue with O(1) enqueue |
| Notification status endpoint shows "accepted" forever | Fan-out worker is not running (crashed) or the main queue is not draining | Alert on main-queue depth; add a heartbeat check that verifies fan-out worker is consuming; page on lag > N minutes |
| Duplicate dedup_key accepted on retry — second API call creates a new notif_id | Notification-level dedup store entry has already expired (short TTL) or was never written (API crash before INSERT) | Write the dedup entry and the notification record in the same DB transaction; set dedup TTL to at least the retention window of the notification |
- Start with
GET /v1/notifications/{id}— read each channel'sstatus,attempts, and any error code. - For failures, check the channel DLQ: provider error codes (FCM, APNs, Twilio) directly name the failure cause.
- For suspected duplicates, query the dedup store:
SELECT * FROM notif_dedup WHERE key = '...' AND channel = '...'— if two rows exist, a race occurred. - For preference violations, compare
sent_aton the notification withopted_out_aton the user preference — if opt-out is earlier, preferences were checked at the wrong stage. - Monitor queue depth and worker lag as primary health metrics; alert before the user-visible SLA is breached.
🧠 Quick check
1. Sending notifications through an async queue instead of synchronously inside the API request mainly buys you:
Fan-out to email/SMS/push providers is slow and failure-prone. A queue lets the API return immediately, absorbs spikes, and retries failed sends without blocking the caller.
2. To avoid sending the same push twice when a send is retried after a timeout, you:
A timeout doesn't tell you whether the send happened. Keying each notification by a dedup id lets the worker recognise and skip a duplicate — the same idempotency pattern as payments.
3. Before dispatching, the notification service must check:
Respecting preferences/opt-outs is both a product and compliance requirement — sending to a channel a user disabled (or after they unsubscribed) is a real failure, independent of system health.
Key takeaways
- Async queue decoupling is non-negotiable for fan-out at scale — the API must respond in <50 ms regardless of recipient count; workers scale independently per channel.
- Per-channel provider abstraction insulates the API from provider changes and lets channels fail independently.
- Idempotent send via caller-supplied dedup key is the correct pattern — never rely on server-generated IDs for dedup because retry generates a new ID.
- Enforce preferences at dispatch time, not enqueue time — preference changes after enqueue would otherwise be ignored.
- Per-user rate limits + DLQ complete the pipeline: no user is spammed, and undeliverable messages are preserved for inspection rather than silently dropped.