Design Case Studies · Lesson 07

Design: Video Streaming API

Uploading a 4K film and streaming it to a billion simultaneous viewers are two completely different problems — yet a video platform must solve both in the same system. This case study traces the full pipeline from the creator's browser through transcoding workers to the viewer's adaptive player, exposing the API design decisions that make each stage work at scale.

⏱ ~18 min Difficulty: advanced Prereq: Caching (rel-07), Pub/Sub (rel-10), File Upload (cs-02)

By the end you'll be able to

Sketch the upload → transcode → CDN delivery pipeline and name the API surface at each stage.
Explain why presigned URLs, async transcoding, and HLS/DASH manifests each exist, and what breaks if you skip them.
Estimate CDN hit-ratio math and articulate the origin-load savings it produces.

Requirements

Before drawing any boxes, pin down what the system must do. Video platforms are deceptively complex because they sit at the intersection of three very different workloads: large-file ingest, CPU-intensive batch processing, and globally distributed read-heavy delivery. Getting requirements wrong means you optimize for the wrong bottleneck.

Functional requirements

Upload large video files — creators submit raw footage up to tens of gigabytes. The upload must be resumable: if the connection drops at 95%, the creator should not restart from zero.
Transcode to multiple resolutions — the raw upload gets converted into at least four renditions: 360p, 720p, 1080p, and 4K where the source permits. Each rendition is cut into short segments (typically 2–6 seconds) for adaptive streaming.
Adaptive playback via HLS or MPEG-DASH — the player continuously monitors available bandwidth and switches renditions mid-stream without interrupting playback. This requires a manifest file that describes every available quality tier and where its segments live.
View count tracking at massive scale — billions of view events per day. Real-time exact counts are unnecessary and expensive; approximate counts updated every few minutes are fine.

Non-functional requirements

Huge read-to-write ratio — for every video uploaded, it may be watched hundreds of millions of times. The system must be optimized almost entirely for reads. A back-of-envelope: YouTube processes ~500 hours of video per minute of upload, but serves billions of hours of watch time per day. That ratio is roughly 1 write for every 50,000+ reads.
Low start latency — the first segment must start playing within ~2 seconds on a good connection. CDN edge caching of the first few segments is not optional.
Transcode SLA — a 10-minute 1080p video should be available in all resolutions within 5 minutes of publish. A 2-hour 4K film may take 30–60 minutes. The upload response must not block on this.

Design decisions

Every major decision in this system comes with a "why" — and for each one, there is a common alternative that fails at scale. Interview panels reward candidates who explain what breaks before explaining what they chose.

Decision 1: Presigned URLs for upload (not API-proxied upload)

Naively, a creator POSTs their video to your API server, which writes it to S3. This works for 10 MB profile photos. For a 20 GB film, it means every byte travels through your API fleet twice: once inbound (creator → API server → object storage) and once outbound when served. Your API servers become the bottleneck, your egress bill doubles, and a single slow upload ties up a connection slot.

The fix: the API server issues a presigned URL — a time-limited, signed S3/GCS endpoint — and returns it to the client. The client uploads directly to object storage, bypassing the API server entirely. The API server sees only the tiny metadata request and the final "upload complete" callback. See the File Upload case study (cs-02) for the full presigned URL pattern including resumable chunked uploads.

Decision 2: Async transcode pipeline (not synchronous in-request)

Transcoding a 4K video takes minutes of wall-clock time and gigabytes of scratch disk. Doing it synchronously — holding the HTTP connection open until the job finishes — would mean upload requests time out, clients implement complex retry logic, and a transcode backlog crashes your API. Instead, completing the upload triggers an event on a queue. Transcode workers pull jobs, process in parallel, and emit completion events when done. The API responds immediately with a 202 Accepted and a job handle. Clients poll or receive a webhook. See the Event-driven & Pub/Sub lesson (rel-10) for the queue mechanics.

Decision 3: CDN delivery for segments (not direct origin serving)

Once transcoded, video segments are immutable bytes that never change. CDN edge nodes are designed precisely for serving immutable, cacheable content to geographically distributed audiences. Serving segments from your origin for even 1% of requests would require a network and cost investment that rivals the CDN itself. The CDN handles the read surge; the origin only exists to fill cache misses. See the Caching lesson (rel-07) for CDN cache semantics including Cache-Control: public, max-age=31536000, immutable for segment files.

Decision 4: HLS/DASH manifests for adaptive streaming

A plain MP4 download requires the client to buffer the entire file before seeking reliably. Adaptive streaming solves three problems at once: bandwidth adaptation (drop from 1080p to 360p mid-stream when the network degrades), fast start (buffer only the first 2–4 segments before playing), and seeking (jump directly to the segment containing the target timestamp). HLS uses a .m3u8 playlist; MPEG-DASH uses an XML .mpd manifest. Both index the same underlying segments.

Decision 5: Webhook + polling for transcode status

Clients need to know when a video is ready. Two mechanisms work together: polling for creator dashboards that can tolerate a GET request every 5 seconds, and webhooks for server-to-server integrations that want push notification. Never make the client wait on a long-poll — transcode jobs take minutes and connection timeouts make long-polling unreliable for this duration.

The API model

Six endpoints cover the full lifecycle. Note that the HLS manifest and segments live on the CDN domain — they are not routes on your API server.

POST /v1/videos — initiate upload

The creator sends metadata; the API returns a video ID and a presigned upload URL. No video bytes travel through the API server.

// Request
POST /v1/videos
Authorization: Bearer {token}
Content-Type: application/json

{
  "title":       "Climbing the Eiger: North Face",
  "description": "Solo ascent, summer 2025.",
  "content_type": "video/mp4",
  "file_size":    8472983040  // bytes; used to configure multipart upload
}

// Response 201 Created
{
  "video_id":     "vid_2wXk9mPqRn7v",
  "status":       "awaiting_upload",
  "upload_url":   "https://uploads.example-cdn.com/vid_2wXk9mPqRn7v?X-Amz-Signature=...",
  "upload_method": "PUT",
  "upload_expires_at": "2026-06-20T18:30:00Z"  // presigned URL TTL: 1 hour
}

PUT {upload_url} — chunked upload direct to object storage

The client PUTs to the presigned URL. For large files, S3 multipart upload allows up to 10,000 parts, each 5 MB–5 GB. The API server sees nothing.

// Multipart part (repeated for each 50 MB chunk)
PUT https://uploads.example-cdn.com/vid_2wXk9mPqRn7v?partNumber=3&uploadId=xKFdH...
Content-Length: 52428800
Content-Type: video/mp4

[binary chunk body]

// Response 200 from object storage
ETag: "d8e8fca2dc0f896fd7cb4cb0031ba249"  // save for CompleteMultipartUpload

POST /v1/videos/:id/publish — trigger transcode

After all parts are uploaded and CompleteMultipartUpload succeeds, the client calls publish. This moves the video out of draft and enqueues the transcode job. Returns immediately with 202.

// Request
POST /v1/videos/vid_2wXk9mPqRn7v/publish
Authorization: Bearer {token}

// Response 202 Accepted
{
  "video_id":       "vid_2wXk9mPqRn7v",
  "status":         "transcoding",
  "transcode_job_id": "tjob_8RpQn3LvMz1w",
  "estimated_completion_at": "2026-06-20T17:25:00Z"
}

GET /v1/videos/:id — fetch video metadata

Returns the canonical video record: manifest URL, metadata, current transcode status, and available renditions once processing completes.

// Request
GET /v1/videos/vid_2wXk9mPqRn7v
Authorization: Bearer {token}

// Response 200 OK (video ready)
{
  "video_id":      "vid_2wXk9mPqRn7v",
  "title":         "Climbing the Eiger: North Face",
  "status":        "ready",
  "duration_s":    4320,
  "view_count":    142873,
  "manifest_url":  "https://cdn.example.com/vid_2wXk9mPqRn7v/manifest.m3u8",
  "thumbnail_url": "https://cdn.example.com/vid_2wXk9mPqRn7v/thumb.jpg",
  "renditions": [
    { "quality": "360p",  "bitrate_kbps": 400  },
    { "quality": "720p",  "bitrate_kbps": 2500 },
    { "quality": "1080p", "bitrate_kbps": 5000 },
    { "quality": "4K",    "bitrate_kbps": 18000 }
  ]
}

GET /v1/videos/:id/transcode-status — poll async job

Fine-grained progress for creator dashboards. Separate from GET /videos/:id to avoid cache-busting the main resource — the main endpoint can be cached aggressively once the video is ready.

// Response 200 OK (in-progress)
{
  "job_id":       "tjob_8RpQn3LvMz1w",
  "status":       "transcoding",
  "progress_pct": 62,
  "current_pass": "1080p",
  "passes_done":  2,
  "passes_total": 4,
  "eta_seconds":  183
}

// Response 200 OK (complete)
{
  "job_id":       "tjob_8RpQn3LvMz1w",
  "status":       "complete",
  "progress_pct": 100,
  "completed_at": "2026-06-20T17:22:47Z"
}

GET /{id}/manifest.m3u8 — HLS manifest (CDN, not API server)

This is not an API route — it is a file served directly from the CDN origin bucket. The URL is returned in the video metadata; the player fetches it independently.

# HLS master playlist  —  served from cdn.example.com, not api.example.com
#EXTM3U
#EXT-X-VERSION:3

# Each variant stream points to a sub-playlist of 4-second segments
#EXT-X-STREAM-INF:BANDWIDTH=400000,RESOLUTION=640x360,CODECS="avc1.42c01e,mp4a.40.2"
360p/playlist.m3u8

#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720,CODECS="avc1.4d401f,mp4a.40.2"
720p/playlist.m3u8

#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080,CODECS="avc1.640028,mp4a.40.2"
1080p/playlist.m3u8

#EXT-X-STREAM-INF:BANDWIDTH=18000000,RESOLUTION=3840x2160,CODECS="avc1.640033,mp4a.40.2"
4k/playlist.m3u8

Interview angle

When asked "design YouTube" in a system design interview, examiners want to hear three pipeline stages with specific justifications: (1) upload via presigned URL so your API fleet never touches video bytes; (2) async transcode via a job queue so the upload response is immediate and transcode workers scale independently; (3) CDN delivery with a target hit ratio above 95% so origin load is a rounding error. Bonus points: mention that the manifest file and segments use separate Cache-Control TTLs — manifests are short-lived (30 s) while segment files are immutable (max-age=31536000). Tying CDN hit ratio back to a latency budget (1st byte <50 ms from edge vs. ~200 ms from origin) demonstrates systems thinking beyond "add a CDN."

Diagrams

Upload pipeline: the API server issues a presigned URL and immediately steps aside. Object storage fires an event on upload completion; transcode workers consume jobs independently of the HTTP path. CDN cache-fills from object storage only on misses.

Playback pipeline: the player's ABR algorithm continuously measures download throughput and picks the highest quality tier that fits. CDN edge nodes serve 98%+ of segment requests without touching the origin. A cache miss to origin adds ~130 ms but is invisible to viewers due to buffer pre-fetching.

Pitfall: serving video from your API servers

Two mistakes appear so often in system design interviews that they are worth calling out explicitly.

Proxying video bytes through your API server is the first. Your API fleet has finite network bandwidth — typically a few Gbps per instance. A single 4K stream at 18 Mbps times 10,000 concurrent viewers = 180 Gbps. No API fleet handles that without becoming the single most expensive line item on your AWS bill. Object storage + CDN costs a fraction of the equivalent API fleet bandwidth.

Synchronous transcoding inside the upload handler is the second. Transcoding a 1-hour 4K video can take 20–40 minutes of CPU time. Holding the HTTP connection open for that duration means: every client needs to implement timeout/retry logic; a transcode backlog ties up API worker threads; a single slow job degrades every concurrent upload. Async queues exist precisely for workloads with unbounded wall-clock duration.

Tip: separate domains for API, upload, and CDN

Use three distinct hostnames for the three traffic types: api.example.com for your REST API, uploads.example.com (or a direct S3/GCS URL) for object storage ingest, and cdn.example.com for media delivery. This separation buys you several things: you can apply different rate limits and autoscaling policies per domain; CDN caching rules apply cleanly to cdn.example.com without accidentally caching API responses; and your TLS certificates, WAF rules, and CORS headers stay untangled. Never route media bytes through api.example.com — the second you do, your CDN configuration becomes impossible to reason about.

Evaluation & latency budget

CDN offload math

Video segments are the definition of cacheable: they are immutable (the bytes for the 00:02:00–00:02:04 window of a given rendition never change), they are large (a 4-second 1080p segment at 5 Mbps is ~2.5 MB), and they are popular at the head of the view distribution. Once a segment is in a CDN edge node's cache, every subsequent viewer in that region gets it without touching the origin.

Suppose a video has 1,000,000 views. Each view fetches an average of 30 segments (2 minutes of a 72-minute film). That is 30,000,000 segment requests. With a 98% CDN hit ratio:

Origin serves: 30,000,000 × 0.02 = 600,000 requests
CDN serves: 29,400,000 requests at edge latency (<20 ms)
Without CDN: 30,000,000 requests at origin — 50× the origin fleet required

The first viewer per region per segment pays the origin-fetch cost (~150 ms). All subsequent viewers pay edge cost (~15 ms). Popular videos have effective hit ratios above 99.5% because global distribution means even a video with 100M views fills each edge cache quickly.

Async transcode: why synchronous would fail

Consider a 2-hour 4K film. A single-pass transcode at real-time ratio of ~6:1 takes ~20 minutes per quality tier × 4 tiers = ~80 minutes of sequential work, or ~20 minutes in parallel. The HTTP keep-alive timeout on most load balancers is 60–300 seconds. The upload handler would time out before the first rendition finishes. Even with long-polling or WebSockets, holding a server connection for 20 minutes consumes one worker thread that cannot serve other requests. At 10,000 concurrent uploads, that is 10,000 idle-but-blocked threads — the exact scenario Node.js async queues were designed to avoid.

The async pattern trades a single complex HTTP response for a simple 202 + a job-status poll. The complexity moves out of the HTTP layer and into the queue worker, where it belongs.

Latency breakdown table

Request type	Served from	Typical latency (P50)	Typical latency (P99)
HLS master manifest (first load)	CDN edge (short TTL — 30 s)	18 ms	55 ms
Video segment (CDN hit)	CDN edge (immutable, long TTL)	12 ms	35 ms
Video segment (CDN miss, origin fill)	Object storage via CDN	155 ms	420 ms
GET /v1/videos/:id (metadata)	API server + Redis cache	28 ms	90 ms
POST /v1/videos/:id/publish	API server (writes DB + enqueues)	45 ms	140 ms

Back-of-envelope: view scale

YouTube serves roughly 1 billion hours of video per day. That is approximately 1.16 × 10¹² seconds of playback per day, or ~13.4 million concurrent streams. At an average bitrate of 2 Mbps (mix of mobile 360p and desktop 1080p), that is ~26.8 Tbps of sustained CDN egress. No single origin data center can deliver that — CDN edge distribution is not an optimization, it is the architecture.

For a mid-scale platform targeting 10 million monthly active viewers with 30 minutes average watch time per day:

Daily playback seconds: 10M × 30 × 60 = 18 billion seconds
Avg bitrate 1.5 Mbps: ~27 TB/day egress
CDN cost at $0.01/GB: ~$270/day — affordable for a serious product
Origin-only cost at $0.09/GB: ~$2,430/day — 9× more, and still slower for users

Under the hood: the transcode pipeline

The words "upload → transcode → CDN" hide a multi-stage pipeline with specific data structures at every handoff. Here is how each stage actually works.

Stage 1: chunked upload to object storage

The client uses S3 multipart upload. It calls CreateMultipartUpload to receive an uploadId, then PUTs each 50–500 MB chunk as a numbered part. The object storage returns an ETag (MD5 of the part bytes) for each part. When all parts are uploaded, the client calls CompleteMultipartUpload with the ordered list of (partNumber, ETag) pairs. Object storage atomically assembles the parts into a single object. If the connection drops mid-upload, the client resumes from the last successful part number — only that part's bytes need to be retransmitted.

# Multipart state machine
CreateMultipartUpload  →  uploadId = "xKFdH..."
PutObject part 1       →  ETag: "abc123"
PutObject part 2       →  ETag: "def456"
PutObject part N       →  ETag: "xyz789"
CompleteMultipartUpload(uploadId, [(1,"abc123"), (2,"def456"), ...(N,"xyz789")])
  → raw video object assembled atomically in object storage

Stage 2: job queue and transcode workers

Object storage fires an s3:ObjectCreated event (or equivalent) when CompleteMultipartUpload succeeds. This event is published to a queue (SQS, Cloud Pub/Sub). A transcode job record is created in a database table with this schema:

-- Transcode job record (simplified)
job_id        TEXT PRIMARY KEY,    -- "tjob_8RpQn3LvMz1w"
video_id      TEXT NOT NULL,
status        TEXT NOT NULL,       -- queued | processing | complete | failed
renditions    JSONB,               -- per-rendition progress: [{quality:"1080p", status:"done", pct:100}, ...]
attempt       INT DEFAULT 0,
max_attempts  INT DEFAULT 3,
worker_id     TEXT,               -- which machine owns this job
locked_until  TIMESTAMPTZ,        -- distributed lock; worker must renew or job is re-queued
created_at    TIMESTAMPTZ,
started_at    TIMESTAMPTZ,
completed_at  TIMESTAMPTZ

A transcode worker pulls the job, sets worker_id and locked_until = now() + 10min, and begins FFmpeg transcoding. If the worker dies without renewing the lock, a supervisor re-queues the job for another worker. This prevents lost jobs without a single coordinator.

Stage 3: producing the rendition ladder

Each worker runs one or more FFmpeg passes. A typical rendition ladder for a 1080p source:

Rendition	Resolution	Video codec	Target bitrate	Segment duration
360p	640×360	H.264 baseline	400 kbps	4 s
720p	1280×720	H.264 main	2,500 kbps	4 s
1080p	1920×1080	H.264 high	5,000 kbps	4 s
4K	3840×2160	H.265 / VP9	18,000 kbps	4 s

After FFmpeg produces each rendition as a continuous stream, a segmenter (FFmpeg with -f hls or a dedicated tool like Bento4) cuts it into fixed-duration .ts or .mp4 fragments and writes a per-rendition playlist.m3u8. A segment naming convention like 1080p/seg_0000.ts, 1080p/seg_0001.ts, ... makes each segment addressable independently, enabling the CDN to cache individual 2.5 MB chunks rather than multi-gigabyte files.

Stage 4: writing the HLS master manifest

Once all rendition playlists are written to object storage, the worker assembles the master manifest (manifest.m3u8) that links them together. Each #EXT-X-STREAM-INF line carries the BANDWIDTH (bits/second) and RESOLUTION that the ABR algorithm uses to choose a rendition. The manifest is written last — writing it atomically "publishes" the video: before it exists, any player request 404s; after it exists, the full ladder is reachable.

Stage 5: how adaptive bitrate picks a rendition at playback

The player implements an ABR algorithm. A simple throughput-based algorithm works as follows:

Player downloads segment N of the current rendition and measures actual download throughput (bytes received / time taken).
It applies a safety factor (e.g. 0.8×) to avoid oscillation: safe_throughput = measured × 0.8.
It scans the master manifest's BANDWIDTH values from highest to lowest and picks the first one at or below safe_throughput.
For segment N+1 it requests the chosen rendition's next segment.
If the buffer falls below a threshold (e.g. 4 s), it immediately steps down one quality tier regardless of throughput.

The player never interrupts playback during a quality switch — it finishes playing the buffered segments of the old rendition while fetching the new rendition's next segment. Because all renditions share the same GOP (group of pictures) alignment and segment duration, the switch point is seamless.

Worked transcode job trace

Timeline for a 10-minute 1080p upload (7 GB raw file, 4-worker parallel transcode):

T+0s Creator POSTs POST /v1/videos → gets video_id + presigned upload URL T+0–180s Client uploads 7 GB in 140 × 50 MB parts (direct to S3) T+181s CompleteMultipartUpload succeeds → S3 fires s3:ObjectCreated event T+182s API server receives event, creates job record (status=queued), enqueues job T+183s POST /v1/videos/vid_.../publish → 202 Accepted {transcode_job_id: "tjob_..."} T+184s Worker W1 claims job (status=processing, locked_until=T+10min) T+184–280s W1 runs FFmpeg: 360p rendition → 2,400 segments → writes 360p/playlist.m3u8 T+184–310s W2 runs FFmpeg: 720p rendition → 2,400 segments → writes 720p/playlist.m3u8 T+184–370s W3 runs FFmpeg: 1080p rendition → 2,400 segments → writes 1080p/playlist.m3u8 T+184–420s W4 runs FFmpeg: 4K rendition → 2,400 segments → writes 4k/playlist.m3u8 T+421s W4 (last to finish) writes master manifest.m3u8 → video is now "ready" T+421s job.status = complete; webhook fires to creator; video_id status = "ready" Total transcode wall time: ~4 min (parallel) vs ~17 min (sequential)

Operating & debugging it

The transcode pipeline has four independently observable stages. Most production issues fall into one of them.

Key metrics to monitor

Metric	Where to observe	Alert threshold (example)
Queue depth (jobs waiting)	SQS / Pub/Sub console; CloudWatch	>500 jobs queued for >5 min → scale workers
Job age (oldest queued job)	Custom metric: `now() - created_at` for status=queued	>10 min → worker may be stuck or under-provisioned
Transcode failure rate	Status=failed jobs / total jobs; log-based metric	>2% failure rate → investigate FFmpeg errors
Worker lock renewal failures	Application logs for "lock expired, job re-queued"	Any → worker OOM or crash
CDN hit ratio	CDN analytics dashboard; X-Cache HIT/MISS header sampling	<95% → check Cache-Control headers on segments
Origin segment request rate	Object storage access logs; CloudFront origin requests	Spike → CDN miss storm, likely new viral video or TTL misconfiguration

Inspecting a stuck or failed transcode

$ curl -s https://api.example.com/v1/videos/vid_2wXk9mPqRn7v/transcode-status \ -H "Authorization: Bearer $TOKEN" | jq . { "job_id": "tjob_8RpQn3LvMz1w", "status": "failed", "attempt": 2, "max_attempts": 3, "error": "FFmpeg non-zero exit: 1 — Invalid data found when processing input", "renditions": [ {"quality": "360p", "status": "complete"}, {"quality": "720p", "status": "complete"}, {"quality": "1080p", "status": "failed", "error": "encoder overload"}, {"quality": "4K", "status": "skipped"} ] } # "Invalid data found when processing input" = corrupted or truncated source file # "encoder overload" = worker ran out of CPU/RAM mid-encode (check worker instance size)

# Verify a specific segment exists and is cacheable $ curl -I https://cdn.example.com/vid_2wXk9mPqRn7v/1080p/seg_0001.ts HTTP/2 200 cache-control: public, max-age=31536000, immutable x-cache: HIT content-type: video/MP2T # If x-cache: MISS on a popular segment → segments not being written with correct Cache-Control # Inspect the master manifest $ curl -s https://cdn.example.com/vid_2wXk9mPqRn7v/manifest.m3u8 #EXTM3U #EXT-X-VERSION:3 #EXT-X-STREAM-INF:BANDWIDTH=400000,RESOLUTION=640x360 360p/playlist.m3u8 # If manifest.m3u8 returns 404, transcode never completed (all renditions must finish first)

Symptom	Likely cause	Fix
Job stuck in `queued` for >5 min	No available workers; queue backlog; workers crashed	Scale out worker fleet; check worker logs for OOM or crash; verify queue subscription is active
Job repeatedly fails with "lock expired"	Worker machine is too slow or OOM; lock TTL too short	Increase worker instance size; extend lock TTL; ensure lock renewal is running on a background thread
Manifest 404 after job shows "complete"	Worker wrote manifest to wrong path; object storage replication lag	Compare expected vs actual manifest key in object storage; check worker path-generation code
CDN hit ratio drops suddenly	Cache-Control header removed from segments; CDN config change; new viral video warming edge caches	curl -I a segment URL and check cache-control; new viral video miss storm resolves itself quickly
Player stalls at quality switches	GOP misalignment between renditions; segment duration mismatch	Ensure all renditions use the same keyframe interval and segment duration in FFmpeg flags
FFmpeg "Invalid data" error	Corrupted or truncated upload; incomplete CompleteMultipartUpload	Re-upload the source file; verify all part ETags before calling CompleteMultipartUpload

⚠️ Gotcha: writing the manifest before all renditions finish

If your worker writes manifest.m3u8 as soon as the first rendition completes, CDN edges worldwide will immediately cache a manifest that references rendition playlists that do not yet exist. Players will fetch those playlists and get 404s for every quality tier above the one that finished first. The master manifest must be written atomically after all rendition playlist.m3u8 files are confirmed present in object storage. In a multi-worker setup, use a distributed counter or barrier: each worker atomically decrements a "renditions remaining" counter; the worker that brings it to zero writes the manifest.

Quiz

🧠 Quick check

Q1: Why use presigned URLs for video upload instead of routing the bytes through your API server?

The API server's network capacity is sized for JSON requests, not multi-gigabyte binary transfers. A presigned URL lets the client write directly to object storage — the API server issues only a small signed token, then steps completely out of the data path. Your API fleet stays available for the workload it was sized for.

Q2: Why is adaptive streaming (HLS/DASH) preferable to serving a single MP4 file for video playback?

Adaptive bitrate streaming continuously monitors download throughput and switches between renditions mid-playback — seamlessly stepping from 1080p down to 360p when a mobile viewer moves into weak signal, then back up again. A single MP4 at a fixed bitrate either buffers constantly on a slow connection or under-serves a fast viewer who could get better quality.

Q3: Your video platform achieves a CDN segment cache hit ratio of 98%. What does that mean for your origin servers?

A 98% hit ratio means 98 out of every 100 segment requests are served directly from CDN edge caches without reaching the origin at all. Your origin fleet only processes the remaining 2% — typically the first viewer per region per segment. This is what allows a platform with 10 million concurrent viewers to run on a modest origin cluster rather than a data-center-scale serving fleet.

Practice: design the transcode job schema and completion webhook

You are designing the internal job record that tracks a video transcode, and the webhook payload that fires when the job completes. A well-designed schema here makes retry logic, observability, and client integrations far simpler.

Part 1 — Transcode job record

Design the JSON schema for a transcode job stored in your database. Your schema must support:

Job identity and association to a video
Current status with valid state transitions: queued → processing → complete | failed
Per-rendition progress so the dashboard can show "720p done, 1080p 40%"
Retry tracking: attempt number and max retries
Timestamps for audit: created, started, completed/failed
Worker identity (which machine picked up the job) for debugging stuck jobs

Part 2 — Completion webhook payload

Design the JSON body sent to a registered webhook URL when the transcode job finishes (success or failure). Your payload must:

Be idempotent: include a stable event ID so receivers can deduplicate retries
Carry enough context that the receiver never needs to make a follow-up API call for the common case
Distinguish success from partial success (e.g., 4K failed, lower tiers succeeded) from total failure
Include the manifest URL if the video is playable
Be signed with an HMAC-SHA256 header so receivers can verify authenticity

Rubric

Job schema: includes job_id, video_id, status, renditions[] with per-rendition status/progress, attempt/max_attempts, worker_id, and four timestamps.
Idempotency: webhook body includes a stable event_id (e.g., evt_{job_id}_{attempt}) so re-deliveries are safe to discard.
Self-contained payload: body includes video_id, overall status, per-rendition outcome, manifest URL (if playable), and duration so the receiver avoids a round-trip.
Partial success: status field distinguishes complete (all renditions), partial (some renditions failed), and failed (none usable).
Security: X-Webhook-Signature: sha256={hmac} header described, with the signing secret documented as per-endpoint rather than global.
Bonus: the job schema uses a locked_until timestamp for distributed lock management so two workers cannot double-process the same job.

Key takeaways

Presigned URLs keep the API server out of the data path. Issuing a time-limited signed token and stepping aside is the right architecture for any large-file ingest — the API server's job is orchestration, not bandwidth.
Async transcoding is non-negotiable for video. Any workload measured in minutes of CPU time belongs in a queue/worker, not an HTTP handler. Respond with 202 + job handle; let clients poll or receive a webhook.
HLS/DASH solves three problems simultaneously: bandwidth adaptation, fast start via segment pre-buffering, and accurate seeking — none of which plain MP4 progressive download provides reliably.
CDN hit ratio is the most important operational metric for a video platform. At 98%+ hit rate, origin load is a rounding error. Below ~90%, your origin bill and latency both escalate non-linearly.
Segment files and manifest files have different optimal TTLs. Segments are immutable — use max-age=31536000, immutable. Master manifests must expire in ~30 s so newly published renditions become visible promptly.
Separate API, upload, and CDN domains from day one. Mixing them couples your caching policy, WAF rules, and autoscaling in ways that are expensive to untangle at scale.

Sources & further reading

Apple HTTP Live Streaming (HLS) specification — the authoritative reference for .m3u8 playlists, segment formats, and encryption.
MPEG-DASH Industry Forum (DASHIF) — specifications and interoperability guidelines for Dynamic Adaptive Streaming over HTTP.
AWS S3 Presigned URLs — developer guide — how to generate, scope, and expire signed upload/download URLs for S3.
Cloudflare Stream documentation — a managed video pipeline that illustrates the upload, transcode, and playback architecture described in this lesson.
Google / YouTube Engineering Blog — primary source for YouTube-scale infrastructure decisions including transcode pipelines and CDN architecture.