Design Case Studies · Lesson 05

Design: URL Shortener

A URL shortener looks deceptively simple — until you need to serve a billion redirects per day, count every click accurately, and avoid generating the same short code twice. This is an original design created for this course.

⏱ 16 min Difficulty: core Prereq: HTTP, caching, rate limiting

By the end you'll be able to

Compare base62-encoded counter vs hash-based key generation and articulate the collision trade-off.
Explain why 301 and 302 produce different analytics outcomes and choose the right status for a given goal.
Sketch a read-heavy architecture with CDN, in-process cache, and a separated write path.

1 · Requirements

Start with a concrete brief. A URL shortener must:

Requirement	Implication
Convert a long URL to a short code	Key generation strategy needed
Redirect `/:code` to the original URL	Read path is the hot path
Support custom aliases (`/my-brand`)	Uniqueness check on write; collision handling
Basic analytics per link	Click counting, referrer, country; see 301 vs 302 discussion
Read-heavy at scale (100:1 read/write ratio)	Cache aggressively; CDN for redirects
Abuse prevention	Rate-limit link creation by IP or API key

Non-requirements for this design: real-time analytics dashboards, link expiry (can be added later), geographic redirect rules. Stating these out loud in an interview earns points — it shows scope control.

2 · Design decisions

2a. Key generation: counter vs hash

Think of the short code as a ticket number at a deli. Strategy 1: the deli has a ticket machine — you pull the next sequential number and encode it. Strategy 2: you hash the order itself and hope two orders never hash alike. The ticket machine is deterministic; the hash approach is probabilistic.

Strategy	How	Pros	Cons
Base62 of atomic counter	Increment a global counter; encode in base62 (a-z A-Z 0-9)	No collisions; short (6 chars → ~56 billion IDs); predictable growth	Counter is a hot write bottleneck; sequential codes are guessable
Hash + collision check	SHA-256 the URL; take first 7 chars; check DB; re-hash on collision	Distributed — no central counter	Collision probability grows with scale; repeated DB round-trips
Pre-generated pool	Batch-generate random codes offline; workers claim from the pool	No write-path bottleneck; unpredictable codes	Pool management complexity; cold-start emptiness

This design uses base62 of a counter with a distributed counter (e.g. Redis INCR or a Snowflake-style ID) to avoid the single-writer bottleneck while keeping code length short and collisions impossible.

✅ Encode, don't just increment

Counter 125 in base62 is cb — two characters. Counter 3521614606207 is zzzzzz — six characters. Base62 gives you an enormous ID space in a short string. Never expose the raw integer; it reveals your traffic volume to competitors.

2b. 301 vs 302 — the cache vs analytics trade-off

Both status codes redirect the browser, but with very different long-term behaviour:

Status	Meaning	Browser behaviour	Analytics impact
301 Moved Permanently	The destination is permanent; cache it forever	Browser caches the redirect and never asks again	After first visit, clicks bypass your server entirely — you can't count them
302 Found (or 307)	Temporary redirect; always re-check	Browser asks your server every time	Every click hits your server; you can count, log, and geo-route

301 lets the CDN and browser cache the redirect, so at massive scale the server load approaches zero for popular links. The price is losing click counting after the first request. 302 guarantees every click is counted but every click costs a server round-trip. Most commercial URL shorteners use 302 by default because analytics are the product, and add 301 as an opt-in "performance mode" for users who don't need tracking.

2c. Cache hot links and CDN

With 302 redirects, the redirect server is in the critical path of every click. Mitigate this with two caching layers:

CDN edge: cache /:code → 302 + Location header at the edge. Even with 302, the CDN can serve the redirect header if Cache-Control allows short TTLs (e.g. Cache-Control: public, max-age=60). The click still hits the CDN PoP — far faster than origin — and the analytics counter is updated asynchronously by a stream of CDN access logs.
In-process LRU cache: each redirect server holds the top ~100k hot links in memory. Lookup is sub-millisecond. Evict on a TTL of a few minutes to stay consistent with updates.

See Caching (rel-07) for cache invalidation strategies and TTL tuning.

2d. Rate-limit link creation

The read path (redirect) is naturally self-throttling — you can only request links that exist. The write path (create link) is the abuse vector: spammers create millions of phishing links. Rate-limit by API key (for authenticated users) and by IP (for unauthenticated). See Rate Limiting (rel-03) for token-bucket and sliding-window algorithms. A reasonable default: 100 link creations per hour per API key, 10 per hour per unauthenticated IP.

3 · The API model

# 1. Create a short link
POST /v1/links
Authorization: Bearer <api_key>
Content-Type: application/json

{
  "url": "https://example.com/very/long/path?with=many&query=params",
  "alias": "launch2026"  # optional custom alias
}

# Response 201
{
  "code": "launch2026",
  "short_url": "https://shr.example/launch2026",
  "original_url": "https://example.com/very/long/path?with=many&query=params",
  "created_at": "2026-06-20T10:00:00Z"
}

# Conflict — alias already taken
# Response 409
{
  "error": "alias_conflict",
  "message": "The alias 'launch2026' is already in use."
}

# Rate limit exceeded
# Response 429 with Retry-After header

# 2. Redirect (the hot path)
GET /launch2026   # or /:code at the redirect domain

# Response 302 (tracking) or 301 (performance mode)
HTTP/1.1 302 Found
Location: https://example.com/very/long/path?with=many&query=params
Cache-Control: public, max-age=60
X-Link-Code: launch2026

# Code not found → 404
HTTP/1.1 404 Not Found
{ "error": "link_not_found" }

# 3. Link analytics
GET /v1/links/launch2026/stats?from=2026-06-01&to=2026-06-20
Authorization: Bearer <api_key>

# Response 200
{
  "code": "launch2026",
  "total_clicks": 142837,
  "unique_ips": 98204,
  "top_countries": [
    { "country": "US", "clicks": 54012 },
    { "country": "IN", "clicks": 31440 }
  ],
  "top_referrers": [
    { "referrer": "twitter.com", "clicks": 28900 }
  ]
}

Read path: CDN serves hot links at the edge. Cache miss falls through to the redirect server's in-process LRU cache. Only a cold-start or cache-expired request reaches the database.

Write path: the API server validates the request, claims a code from the counter service, writes to the links DB, and fans the click event to an analytics queue asynchronously.

4 · Evaluation & latency budget

Read-heavy scaling

With a 100:1 read/write ratio, the redirect path is the system's heartbeat. The CDN absorbs the majority of traffic — popular links like a viral tweet hit hundreds of millions of times will be cached at CDN PoPs globally. The redirect server only sees CDN cache misses (cold start, low-traffic links, TTL expiry). The in-process LRU cache eliminates most database reads even after CDN misses.

Write path

Link creation is rare but critical — it must not fail silently. The counter service (Redis INCR) is the only synchronous external call on the write path; it's fast (~1 ms) and Redis replication provides durability. The database write is synchronous to return the short URL to the caller. Analytics writes are always async — a dropped analytics event is far less severe than a dropped link creation.

Latency budget

Operation	Target p99	How
Redirect — CDN hit	< 10 ms	Edge PoP, no origin contact
Redirect — CDN miss, cache hit	< 30 ms	In-process LRU, no DB
Redirect — cold (DB read)	< 80 ms	Single indexed lookup by code
Link creation	< 200 ms	Counter + 1 DB write; rare path
Stats query	< 500 ms	Pre-aggregated in analytics store; separate from redirect DB

🎯 Interview angle

The 301 vs 302 question is a classic trap. If you say "use 301 for performance," the follow-up is: "so you can't count clicks after the first visit — is that acceptable?" Always frame it as a trade-off and let the requirements drive the answer. For most URL-shortening products, analytics are the core value proposition, so 302 wins despite the extra server load.

⚠️ Common trap

Designing the analytics counter as a synchronous DB increment on the redirect path. That makes the DB a write bottleneck for every click — a viral link would hammer a single counter row with millions of concurrent UPDATE statements. Instead, batch click events in memory or a queue and flush them periodically. Accepting approximate counts (±0.1%) is far better than making the redirect path depend on analytics writes.

✍️ Exercise: handle custom alias collision gracefully

A user requests the alias sale but it already exists. Design the full API behaviour: which status code, what response body, and what options do you give the user? What if the existing alias was created by the same user?

Model answer:

# Alias taken by another user → 409 Conflict
{
  "error": "alias_conflict",
  "message": "'sale' is already in use. Try 'sale-2026' or omit alias for auto-generation.",
  "suggestions": ["sale-2026", "mysale"]
}

# Alias taken by the SAME user, same target URL → 200 (idempotent)
{
  "code": "sale",
  "short_url": "https://shr.example/sale",
  "reused": true
}

# Alias taken by the same user, DIFFERENT target URL → 409
# (changing the destination silently would break existing links)

Rubric: ✓ 409 for genuine conflict ✓ actionable suggestions in error body ✓ idempotency: same user + same URL = 200 ✓ same user + different URL = 409 to prevent silent link hijack.

Under the hood: the core mechanism

The two mechanisms that make a URL shortener work are the base62 encoding of a counter (write path) and the three-layer cache lookup (read path). Both are worth tracing numerically so you can answer "how short is the short code?" and "how fast is the redirect?"

Base62 encoding: worked numeric trace

Base62 uses the alphabet 0–9 a–z A–Z (62 characters). To encode a counter integer, repeatedly divide by 62 and collect the remainders — the remainders in reverse order give the encoded string. The alphabet index maps remainder → character.

# Alphabet: index 0='0', 1='1', …9='9', 10='a', 11='b', …35='z', 36='A', …61='Z'

# Example 1: counter id = 125
125 ÷ 62 = 2 remainder 1   → character '1'
  2 ÷ 62 = 0 remainder 2   → character '2'
# remainders collected in reverse: '2', '1'
short code = "21"

# Example 2: counter id = 3844  (62²)
3844 ÷ 62 = 62 remainder 0  → character '0'
  62 ÷ 62 =  1 remainder 0  → character '0'
   1 ÷ 62 =  0 remainder 1  → character '1'
short code = "100"   # 3-char code starts at id 3844

# Example 3: counter id = 56,800,235,584  (62⁶ = first 7-char code)
# IDs 0 – 62^6-1 (= 56,800,235,583) fit in exactly 6 base62 chars
# That is 56 billion unique short codes from 6 characters
6 chars → 56 billion codes  # ample for any URL shortener

# Decode is the reverse: treat each character as a digit in base-62
"21" → 2 × 62¹ + 1 × 62⁰ = 124 + 1 = 125   ✓

A monotonically increasing Redis INCR counter guarantees no two clients ever get the same integer — so no two codes collide. The base62 encoding of that integer is computed locally (no additional round-trip). This is why the counter approach is strictly better than hashing for collision avoidance: hash-based codes require a "does this code exist?" DB read before every write.

Collision handling for custom aliases

Custom aliases bypass the counter — the caller supplies an arbitrary string. The uniqueness check is a simple SELECT on the alias column (which has a unique index). The race condition where two concurrent requests try to claim the same alias is handled by letting the DB unique constraint be the final arbiter: the first writer's INSERT succeeds; the second gets a unique-constraint violation, which the API layer catches and returns as a 409.

# Write path for a custom alias (pseudocode)
function create_link(url, alias):
  try:
    code = alias if alias else base62(redis.INCR("link_counter"))
    db.execute("INSERT INTO links (code, url) VALUES (?, ?)", code, url)
    return 201, { code, short_url: "https://shr.example/" + code }
  except UniqueConstraintViolation:
    return 409, { error: "alias_conflict" }

Redirect read path: worked lookup trace

The redirect is the hot path. Follow a request for GET /abc123 through all three layers:

# Step 1: Request arrives at CDN edge PoP (e.g. Cloudflare London)
GET /abc123
→ CDN checks its cache for key "abc123"
  CDN HIT:  serve cached 302 Location header immediately — ~5 ms
  CDN MISS: forward to origin redirect server

# Step 2: Origin redirect server (CDN miss)
→ check in-process LRU cache (hot_links["abc123"])
  LRU HIT:  return 302 Location — ~1 ms
  LRU MISS: query the links DB

# Step 3: DB lookup (both caches missed)
SELECT url FROM links WHERE code = 'abc123'  -- indexed; ~5 ms
  FOUND:  populate LRU cache, respond:
          HTTP/1.1 302 Found
          Location: https://example.com/very/long/path
          Cache-Control: public, max-age=60
  NOT FOUND: HTTP/1.1 404 Not Found  {"error":"link_not_found"}

# CDN caches the 302 response for 60 s at the edge
# Next request for /abc123 from any user near that PoP hits the CDN HIT path

The analytics counter is intentionally not on this synchronous path. Each redirect server writes a click event to an in-memory ring buffer, which is flushed to a stream (Kafka or Kinesis) every second. An analytics consumer aggregates these into pre-computed hourly counts. Approximate counts with a ~1-second lag are far better than making every redirect block on a DB UPDATE counter = counter + 1.

Operating & debugging it

The two most common problems are cache invalidation mistakes (a link is updated but old redirects keep serving) and counter skew (the Redis counter diverges, producing gaps or — worse — duplicate codes on failover). Both are diagnosable with standard tools.

Inspect the redirect path live

$ curl -i -L "https://shr.example/abc123" HTTP/2 302 location: https://example.com/very/long/path cache-control: public, max-age=60 x-cache: HIT ← CDN served this; never hit origin x-link-code: abc123 # -L follows the redirect; remove it to inspect the 302 headers only # x-cache: MISS means CDN forwarded to origin $ curl -i -L "https://shr.example/abc123" -H "Cache-Control: no-cache" HTTP/2 302 x-cache: MISS ← forced origin hit; use to verify DB state x-redirect-latency-ms: 8 ← time from origin server (LRU miss → DB) $ redis-cli GET link_counter "3521614606" ← current counter; base62 → 6-char code $ redis-cli GET link_counter; sleep 5; redis-cli GET link_counter "3521614606" "3521614631" ← 25 new links created in 5 s → normal write rate

Symptom → cause → fix

Symptom	Likely cause	Fix
Redirect returns 404 for a code that was just created	CDN or LRU cache served a stale 404 from before the link existed (negative caching)	Do not cache 404 responses, or set a very short TTL (e.g. `Cache-Control: max-age=5` on 404s); ensure new links invalidate CDN entries on write
Updated link destination still redirects to old URL	CDN cached the old 302 and hasn't expired yet (TTL can be up to 60 s)	Purge the CDN cache entry for the specific code on every link update: `cf.purge_cache(["https://shr.example/abc123"])`
Two different codes collide — same DB row claimed by two concurrent creates	Redis failover caused the counter to reset or replay; two workers got the same counter value	Use Redis Sentinel or Cluster with `WAIT 1 0` after INCR to ensure replication before returning; add the unique index on `code` as a safety net
Analytics undercounting — click totals lower than CDN logs show	Analytics ring buffer is being dropped on server restart, or CDN HIT clicks never reach the origin	Flush the ring buffer on SIGTERM; consume CDN access logs as the source of truth for CDN-hit clicks; merge with origin-side counts
Link creation returning 500 intermittently	Redis INCR timeout — counter service is the single synchronous dependency; a slow Redis adds to write latency	Set a tight socket timeout (e.g. 50 ms) on the Redis INCR call; on timeout, fall back to a pre-allocated batch counter range the server holds in memory
Custom alias returns 409 even though the alias looks free	Deleted links are not removed from the DB — soft-delete keeps the code reserved	Reclaim deleted alias codes by either hard-deleting or by allowing the same owner to reclaim their own deleted alias (check `owner_id` + `deleted_at`)

Test a redirect with curl -i first; read x-cache to know whether origin was involved.
Force an origin hit with -H "Cache-Control: no-cache" and read x-redirect-latency-ms.
Verify the DB row directly: SELECT code, url, created_at FROM links WHERE code = 'abc123'.
Check Redis counter health: compare the current value against expected links-per-hour rate.
After any link update, confirm CDN purge happened; re-run curl -i to verify x-cache: MISS on the next request.

🧠 Quick check

1. For a read-heavy URL shortener, the single biggest latency win for redirects is:

Redirects are overwhelmingly reads of popular links. Caching them (and serving from the edge) turns a database lookup + cross-region hop into a near-instant response.

2. Choosing a 302 (temporary) over a 301 (permanent) redirect trades:

A 301 lets browsers/CDNs cache the redirect and skip you entirely (fast, but you lose the click signal). A 302 keeps requests flowing through you so you can record analytics.

3. Generating short codes by base62-encoding an incrementing counter (vs random hashing) mainly avoids:

A monotonic counter is collision-free, so you skip the read-before-write collision check that random codes require. The trade-off is guessable/sequential codes, often mitigated by encoding an obfuscated id.

Key takeaways

Base62 of an atomic counter is the pragmatic key strategy — no collisions, short codes, distributed-safe with Redis INCR.
301 caches in browsers and CDN; 302 counts every click. Most products choose 302 because analytics are the product; 301 is an opt-in performance mode.
The redirect path is read-heavy: CDN edge → in-process LRU → DB. Most requests never reach the database.
Analytics writes must be async. Synchronous click counting on the redirect path collapses under viral traffic.
Rate-limit link creation, not redirects — creation is the abuse vector; redirects self-limit because only valid codes exist.