Design Case Studies · Lesson 05
Design: URL Shortener
A URL shortener looks deceptively simple — until you need to serve a billion redirects per day, count every click accurately, and avoid generating the same short code twice. This is an original design created for this course.
By the end you'll be able to
- Compare base62-encoded counter vs hash-based key generation and articulate the collision trade-off.
- Explain why 301 and 302 produce different analytics outcomes and choose the right status for a given goal.
- Sketch a read-heavy architecture with CDN, in-process cache, and a separated write path.
1 · Requirements
Start with a concrete brief. A URL shortener must:
| Requirement | Implication |
|---|---|
| Convert a long URL to a short code | Key generation strategy needed |
Redirect /:code to the original URL | Read path is the hot path |
Support custom aliases (/my-brand) | Uniqueness check on write; collision handling |
| Basic analytics per link | Click counting, referrer, country; see 301 vs 302 discussion |
| Read-heavy at scale (100:1 read/write ratio) | Cache aggressively; CDN for redirects |
| Abuse prevention | Rate-limit link creation by IP or API key |
Non-requirements for this design: real-time analytics dashboards, link expiry (can be added later), geographic redirect rules. Stating these out loud in an interview earns points — it shows scope control.
2 · Design decisions
2a. Key generation: counter vs hash
Think of the short code as a ticket number at a deli. Strategy 1: the deli has a ticket machine — you pull the next sequential number and encode it. Strategy 2: you hash the order itself and hope two orders never hash alike. The ticket machine is deterministic; the hash approach is probabilistic.
| Strategy | How | Pros | Cons |
|---|---|---|---|
| Base62 of atomic counter | Increment a global counter; encode in base62 (a-z A-Z 0-9) | No collisions; short (6 chars → ~56 billion IDs); predictable growth | Counter is a hot write bottleneck; sequential codes are guessable |
| Hash + collision check | SHA-256 the URL; take first 7 chars; check DB; re-hash on collision | Distributed — no central counter | Collision probability grows with scale; repeated DB round-trips |
| Pre-generated pool | Batch-generate random codes offline; workers claim from the pool | No write-path bottleneck; unpredictable codes | Pool management complexity; cold-start emptiness |
This design uses base62 of a counter with a distributed counter (e.g. Redis INCR or a Snowflake-style ID) to avoid the single-writer bottleneck while keeping code length short and collisions impossible.
Counter 125 in base62 is cb — two characters. Counter 3521614606207 is zzzzzz — six characters. Base62 gives you an enormous ID space in a short string. Never expose the raw integer; it reveals your traffic volume to competitors.
2b. 301 vs 302 — the cache vs analytics trade-off
Both status codes redirect the browser, but with very different long-term behaviour:
| Status | Meaning | Browser behaviour | Analytics impact |
|---|---|---|---|
| 301 Moved Permanently | The destination is permanent; cache it forever | Browser caches the redirect and never asks again | After first visit, clicks bypass your server entirely — you can't count them |
| 302 Found (or 307) | Temporary redirect; always re-check | Browser asks your server every time | Every click hits your server; you can count, log, and geo-route |
301 lets the CDN and browser cache the redirect, so at massive scale the server load approaches zero for popular links. The price is losing click counting after the first request. 302 guarantees every click is counted but every click costs a server round-trip. Most commercial URL shorteners use 302 by default because analytics are the product, and add 301 as an opt-in "performance mode" for users who don't need tracking.
2c. Cache hot links and CDN
With 302 redirects, the redirect server is in the critical path of every click. Mitigate this with two caching layers:
- CDN edge: cache
/:code → 302 + Location headerat the edge. Even with 302, the CDN can serve the redirect header if Cache-Control allows short TTLs (e.g.Cache-Control: public, max-age=60). The click still hits the CDN PoP — far faster than origin — and the analytics counter is updated asynchronously by a stream of CDN access logs. - In-process LRU cache: each redirect server holds the top ~100k hot links in memory. Lookup is sub-millisecond. Evict on a TTL of a few minutes to stay consistent with updates.
See Caching (rel-07) for cache invalidation strategies and TTL tuning.
2d. Rate-limit link creation
The read path (redirect) is naturally self-throttling — you can only request links that exist. The write path (create link) is the abuse vector: spammers create millions of phishing links. Rate-limit by API key (for authenticated users) and by IP (for unauthenticated). See Rate Limiting (rel-03) for token-bucket and sliding-window algorithms. A reasonable default: 100 link creations per hour per API key, 10 per hour per unauthenticated IP.
3 · The API model
# 1. Create a short link
POST /v1/links
Authorization: Bearer <api_key>
Content-Type: application/json
{
"url": "https://example.com/very/long/path?with=many&query=params",
"alias": "launch2026" # optional custom alias
}
# Response 201
{
"code": "launch2026",
"short_url": "https://shr.example/launch2026",
"original_url": "https://example.com/very/long/path?with=many&query=params",
"created_at": "2026-06-20T10:00:00Z"
}
# Conflict — alias already taken
# Response 409
{
"error": "alias_conflict",
"message": "The alias 'launch2026' is already in use."
}
# Rate limit exceeded
# Response 429 with Retry-After header
# 2. Redirect (the hot path)
GET /launch2026 # or /:code at the redirect domain
# Response 302 (tracking) or 301 (performance mode)
HTTP/1.1 302 Found
Location: https://example.com/very/long/path?with=many&query=params
Cache-Control: public, max-age=60
X-Link-Code: launch2026
# Code not found → 404
HTTP/1.1 404 Not Found
{ "error": "link_not_found" }
# 3. Link analytics
GET /v1/links/launch2026/stats?from=2026-06-01&to=2026-06-20
Authorization: Bearer <api_key>
# Response 200
{
"code": "launch2026",
"total_clicks": 142837,
"unique_ips": 98204,
"top_countries": [
{ "country": "US", "clicks": 54012 },
{ "country": "IN", "clicks": 31440 }
],
"top_referrers": [
{ "referrer": "twitter.com", "clicks": 28900 }
]
}
4 · Evaluation & latency budget
Read-heavy scaling
With a 100:1 read/write ratio, the redirect path is the system's heartbeat. The CDN absorbs the majority of traffic — popular links like a viral tweet hit hundreds of millions of times will be cached at CDN PoPs globally. The redirect server only sees CDN cache misses (cold start, low-traffic links, TTL expiry). The in-process LRU cache eliminates most database reads even after CDN misses.
Write path
Link creation is rare but critical — it must not fail silently. The counter service (Redis INCR) is the only synchronous external call on the write path; it's fast (~1 ms) and Redis replication provides durability. The database write is synchronous to return the short URL to the caller. Analytics writes are always async — a dropped analytics event is far less severe than a dropped link creation.
Latency budget
| Operation | Target p99 | How |
|---|---|---|
| Redirect — CDN hit | < 10 ms | Edge PoP, no origin contact |
| Redirect — CDN miss, cache hit | < 30 ms | In-process LRU, no DB |
| Redirect — cold (DB read) | < 80 ms | Single indexed lookup by code |
| Link creation | < 200 ms | Counter + 1 DB write; rare path |
| Stats query | < 500 ms | Pre-aggregated in analytics store; separate from redirect DB |
The 301 vs 302 question is a classic trap. If you say "use 301 for performance," the follow-up is: "so you can't count clicks after the first visit — is that acceptable?" Always frame it as a trade-off and let the requirements drive the answer. For most URL-shortening products, analytics are the core value proposition, so 302 wins despite the extra server load.
Designing the analytics counter as a synchronous DB increment on the redirect path. That makes the DB a write bottleneck for every click — a viral link would hammer a single counter row with millions of concurrent UPDATE statements. Instead, batch click events in memory or a queue and flush them periodically. Accepting approximate counts (±0.1%) is far better than making the redirect path depend on analytics writes.
✍️ Exercise: handle custom alias collision gracefully
A user requests the alias sale but it already exists. Design the full API behaviour: which status code, what response body, and what options do you give the user? What if the existing alias was created by the same user?
Model answer:
# Alias taken by another user → 409 Conflict
{
"error": "alias_conflict",
"message": "'sale' is already in use. Try 'sale-2026' or omit alias for auto-generation.",
"suggestions": ["sale-2026", "mysale"]
}
# Alias taken by the SAME user, same target URL → 200 (idempotent)
{
"code": "sale",
"short_url": "https://shr.example/sale",
"reused": true
}
# Alias taken by the same user, DIFFERENT target URL → 409
# (changing the destination silently would break existing links)
Rubric: ✓ 409 for genuine conflict ✓ actionable suggestions in error body ✓ idempotency: same user + same URL = 200 ✓ same user + different URL = 409 to prevent silent link hijack.
Under the hood: the core mechanism
The two mechanisms that make a URL shortener work are the base62 encoding of a counter (write path) and the three-layer cache lookup (read path). Both are worth tracing numerically so you can answer "how short is the short code?" and "how fast is the redirect?"
Base62 encoding: worked numeric trace
Base62 uses the alphabet 0–9 a–z A–Z (62 characters). To encode a counter integer, repeatedly divide by 62 and collect the remainders — the remainders in reverse order give the encoded string. The alphabet index maps remainder → character.
# Alphabet: index 0='0', 1='1', …9='9', 10='a', 11='b', …35='z', 36='A', …61='Z'
# Example 1: counter id = 125
125 ÷ 62 = 2 remainder 1 → character '1'
2 ÷ 62 = 0 remainder 2 → character '2'
# remainders collected in reverse: '2', '1'
short code = "21"
# Example 2: counter id = 3844 (62²)
3844 ÷ 62 = 62 remainder 0 → character '0'
62 ÷ 62 = 1 remainder 0 → character '0'
1 ÷ 62 = 0 remainder 1 → character '1'
short code = "100" # 3-char code starts at id 3844
# Example 3: counter id = 56,800,235,584 (62⁶ = first 7-char code)
# IDs 0 – 62^6-1 (= 56,800,235,583) fit in exactly 6 base62 chars
# That is 56 billion unique short codes from 6 characters
6 chars → 56 billion codes # ample for any URL shortener
# Decode is the reverse: treat each character as a digit in base-62
"21" → 2 × 62¹ + 1 × 62⁰ = 124 + 1 = 125 ✓
A monotonically increasing Redis INCR counter guarantees no two clients ever get the same integer — so no two codes collide. The base62 encoding of that integer is computed locally (no additional round-trip). This is why the counter approach is strictly better than hashing for collision avoidance: hash-based codes require a "does this code exist?" DB read before every write.
Collision handling for custom aliases
Custom aliases bypass the counter — the caller supplies an arbitrary string. The uniqueness check is a simple SELECT on the alias column (which has a unique index). The race condition where two concurrent requests try to claim the same alias is handled by letting the DB unique constraint be the final arbiter: the first writer's INSERT succeeds; the second gets a unique-constraint violation, which the API layer catches and returns as a 409.
# Write path for a custom alias (pseudocode)
function create_link(url, alias):
try:
code = alias if alias else base62(redis.INCR("link_counter"))
db.execute("INSERT INTO links (code, url) VALUES (?, ?)", code, url)
return 201, { code, short_url: "https://shr.example/" + code }
except UniqueConstraintViolation:
return 409, { error: "alias_conflict" }
Redirect read path: worked lookup trace
The redirect is the hot path. Follow a request for GET /abc123 through all three layers:
# Step 1: Request arrives at CDN edge PoP (e.g. Cloudflare London)
GET /abc123
→ CDN checks its cache for key "abc123"
CDN HIT: serve cached 302 Location header immediately — ~5 ms
CDN MISS: forward to origin redirect server
# Step 2: Origin redirect server (CDN miss)
→ check in-process LRU cache (hot_links["abc123"])
LRU HIT: return 302 Location — ~1 ms
LRU MISS: query the links DB
# Step 3: DB lookup (both caches missed)
SELECT url FROM links WHERE code = 'abc123' -- indexed; ~5 ms
FOUND: populate LRU cache, respond:
HTTP/1.1 302 Found
Location: https://example.com/very/long/path
Cache-Control: public, max-age=60
NOT FOUND: HTTP/1.1 404 Not Found {"error":"link_not_found"}
# CDN caches the 302 response for 60 s at the edge
# Next request for /abc123 from any user near that PoP hits the CDN HIT path
The analytics counter is intentionally not on this synchronous path. Each redirect server writes a click event to an in-memory ring buffer, which is flushed to a stream (Kafka or Kinesis) every second. An analytics consumer aggregates these into pre-computed hourly counts. Approximate counts with a ~1-second lag are far better than making every redirect block on a DB UPDATE counter = counter + 1.
Operating & debugging it
The two most common problems are cache invalidation mistakes (a link is updated but old redirects keep serving) and counter skew (the Redis counter diverges, producing gaps or — worse — duplicate codes on failover). Both are diagnosable with standard tools.
Inspect the redirect path live
Symptom → cause → fix
| Symptom | Likely cause | Fix |
|---|---|---|
| Redirect returns 404 for a code that was just created | CDN or LRU cache served a stale 404 from before the link existed (negative caching) | Do not cache 404 responses, or set a very short TTL (e.g. Cache-Control: max-age=5 on 404s); ensure new links invalidate CDN entries on write |
| Updated link destination still redirects to old URL | CDN cached the old 302 and hasn't expired yet (TTL can be up to 60 s) | Purge the CDN cache entry for the specific code on every link update: cf.purge_cache(["https://shr.example/abc123"]) |
| Two different codes collide — same DB row claimed by two concurrent creates | Redis failover caused the counter to reset or replay; two workers got the same counter value | Use Redis Sentinel or Cluster with WAIT 1 0 after INCR to ensure replication before returning; add the unique index on code as a safety net |
| Analytics undercounting — click totals lower than CDN logs show | Analytics ring buffer is being dropped on server restart, or CDN HIT clicks never reach the origin | Flush the ring buffer on SIGTERM; consume CDN access logs as the source of truth for CDN-hit clicks; merge with origin-side counts |
| Link creation returning 500 intermittently | Redis INCR timeout — counter service is the single synchronous dependency; a slow Redis adds to write latency | Set a tight socket timeout (e.g. 50 ms) on the Redis INCR call; on timeout, fall back to a pre-allocated batch counter range the server holds in memory |
| Custom alias returns 409 even though the alias looks free | Deleted links are not removed from the DB — soft-delete keeps the code reserved | Reclaim deleted alias codes by either hard-deleting or by allowing the same owner to reclaim their own deleted alias (check owner_id + deleted_at) |
- Test a redirect with
curl -ifirst; readx-cacheto know whether origin was involved. - Force an origin hit with
-H "Cache-Control: no-cache"and readx-redirect-latency-ms. - Verify the DB row directly:
SELECT code, url, created_at FROM links WHERE code = 'abc123'. - Check Redis counter health: compare the current value against expected links-per-hour rate.
- After any link update, confirm CDN purge happened; re-run
curl -ito verifyx-cache: MISSon the next request.
🧠 Quick check
1. For a read-heavy URL shortener, the single biggest latency win for redirects is:
Redirects are overwhelmingly reads of popular links. Caching them (and serving from the edge) turns a database lookup + cross-region hop into a near-instant response.
2. Choosing a 302 (temporary) over a 301 (permanent) redirect trades:
A 301 lets browsers/CDNs cache the redirect and skip you entirely (fast, but you lose the click signal). A 302 keeps requests flowing through you so you can record analytics.
3. Generating short codes by base62-encoding an incrementing counter (vs random hashing) mainly avoids:
A monotonic counter is collision-free, so you skip the read-before-write collision check that random codes require. The trade-off is guessable/sequential codes, often mitigated by encoding an obfuscated id.
Key takeaways
- Base62 of an atomic counter is the pragmatic key strategy — no collisions, short codes, distributed-safe with Redis INCR.
- 301 caches in browsers and CDN; 302 counts every click. Most products choose 302 because analytics are the product; 301 is an opt-in performance mode.
- The redirect path is read-heavy: CDN edge → in-process LRU → DB. Most requests never reach the database.
- Analytics writes must be async. Synchronous click counting on the redirect path collapses under viral traffic.
- Rate-limit link creation, not redirects — creation is the abuse vector; redirects self-limit because only valid codes exist.