API Design

Foundations · Lesson 09

WebSockets & real-time

Plain HTTP is one-sided: the client asks, the server answers, done. But chat, live scores, and collaborative editing need the server to speak first — to push data the moment it changes. WebSockets keep a single connection open in both directions so it can.

⏱ 11 minDifficulty: corePrereq: Lessons 02, 04

By the end you'll be able to

The problem: the server can't start the conversation

In normal HTTP the client always speaks first. So how does a chat app show a message the instant a friend sends it? The naive answer is polling: ask "anything new?" every couple of seconds. It works, but it's wasteful — most requests come back empty, and there's still up-to-2-second lag. Picture phoning the post office every 30 seconds to ask if mail arrived, versus them ringing you when it does.

POLLING (wasteful) "new?" → "no" … "no" … "yes" WEBSOCKET (push) open once (upgrade) connection stays open… server pushes instantly
Polling burns requests to ask repeatedly. A WebSocket opens once and lets the server push the moment something happens.

How a WebSocket starts: the upgrade

A WebSocket begins life as an ordinary HTTP request carrying a special header — Upgrade: websocket. If the server agrees, it replies 101 Switching Protocols and from that point the same TCP connection (Lesson 04) stops speaking HTTP and starts speaking the WebSocket protocol: a long-lived, two-way channel where either side can send a message at any time, with very little per-message overhead.

# client asks to upgrade an existing HTTP connection
GET /chat HTTP/1.1
Host: api.example.com
Upgrade: websocket
Connection: Upgrade

# server agrees — the pipe is now bidirectional
HTTP/1.1 101 Switching Protocols
Upgrade: websocket

Because it reuses the HTTP port (443) and handshake, it sails through firewalls and proxies that already allow web traffic — a big reason it won over older hacks.

🎯 Interview angle

"How would you build live chat / live notifications?" Lead with the trade-off, not a buzzword: polling is simplest but laggy and wasteful; WebSockets give instant bidirectional push at the cost of holding open connections (which is real server state to manage and scale). Then mention you'd need a way to fan a message out to the right connections across many servers — a pub/sub layer. That arc shows depth.

Three tools, increasing power

ApproachDirectionBest forCost
PollingClient pulls repeatedlyRare updates, dead-simple needsWasted requests, lag
SSE (Server-Sent Events)Server → client only (one-way stream)Live feeds, notifications, dashboardsOne-directional; text only
WebSocketBoth directions, anytimeChat, multiplayer, collaborative editingStateful open connections to manage

The instinct: don't reach for WebSockets if you only need server→client updates — SSE is simpler and rides plain HTTP. Use WebSockets when the client must also send frequently and instantly (typing, game moves, cursor positions).

⚠️ Common trap

Forgetting that open connections are state. A million chat users means a million live connections pinned to your servers (recall the file-descriptor limits from Lesson 04). You can't just put a plain stateless load balancer in front and call it done — you need sticky routing or a shared pub/sub bus so a message published on one server reaches a user connected to another. "Just use WebSockets" without this is an incomplete answer.

✅ Do this, not that

Do match the tool to direction and frequency: polling for rare, SSE for one-way streams, WebSockets for true two-way. Don't default to WebSockets for a notifications feed — you'll take on connection-management complexity you didn't need.

Under the hood: how it actually works

The Upgrade handshake — exact bytes

A WebSocket starts as a perfectly ordinary HTTP/1.1 request. The client sends four special headers; the server's 101 response is the signal that the TCP connection has been handed off from HTTP to the WebSocket protocol.

## Client → Server (the HTTP Upgrade request)
GET /chat HTTP/1.1\r\n
Host: api.example.com\r\n
Upgrade: websocket\r\n
Connection: Upgrade\r\n
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==\r\n
Sec-WebSocket-Version: 13\r\n
\r\n

## Server → Client (101 = "I'm switching protocols")
HTTP/1.1 101 Switching Protocols\r\n
Upgrade: websocket\r\n
Connection: Upgrade\r\n
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=\r\n
\r\n
# After this blank line: no more HTTP — pure WebSocket frames both ways

The Sec-WebSocket-Accept value is not a secret — it is derived from the client's key using a fixed GUID. The server concatenates the key with the magic GUID "258EAFA5-E914-47DA-95CA-C5AB0DC85B11", takes SHA-1 of the result, then Base64-encodes it:

# How Sec-WebSocket-Accept is computed (pseudocode)
key     = "dGhlIHNhbXBsZSBub25jZQ=="          # from client header
magic   = "258EAFA5-E914-47DA-95CA-C5AB0DC85B11"
accept  = base64( sha1( key + magic ) )
# = "s3pPLMBiTxaQ9kYGzzhZRbK+xOo="

# Verify it yourself:
$ echo -n "dGhlIHNhbXBsZSBub25jZQ==258EAFA5-E914-47DA-95CA-C5AB0DC85B11" \
  | openssl sha1 -binary | base64
s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

The client validates this value before trusting the upgrade. The purpose is to prove the server intended a WebSocket upgrade (not a misrouted HTTP cache response accidentally being reused as a WebSocket).

The WebSocket frame format

After the handshake, data is exchanged as frames — binary structures with a compact 2–14 byte header:

## WebSocket frame layout (RFC 6455 §5.2)
 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len |    Extended payload length    |
|I|S|S|S|  (4)  |A|     (7)     |           (16/64)             |
|N|V|V|V|       |S|             |                               |
| |1|2|3|       |K|             |                               |
+-+-+-+-+-------+-+-------------+-------------------------------+
|                  Masking-key (if MASK=1, 32 bits)             |
+---------------------------------------------------------------+
|                        Payload data                           |
+---------------------------------------------------------------+

## Key fields:
FIN (1 bit)   — 1 = this is the final fragment of the message
opcode (4 bits):
  0x0 = continuation frame
  0x1 = text frame (UTF-8)
  0x2 = binary frame
  0x8 = close frame (+ 2-byte close code + optional UTF-8 reason)
  0x9 = ping frame
  0xA = pong frame
MASK (1 bit)  — 1 = payload is XOR-masked (clients MUST mask; servers MUST NOT)
Payload len  — 0–125: actual length; 126: next 2 bytes = length; 127: next 8 bytes

The masking requirement is security-related: browsers must mask frames they send so a malicious page can't craft a WebSocket message that looks like an HTTP request to a proxy — a cache-poisoning attack. Server-sent frames are never masked.

Ping / pong and the close handshake

Either side can send a ping (opcode 0x9); the receiver must immediately reply with a pong (opcode 0xA) echoing the payload. This detects dead connections: if pings go unanswered, the side that sent them closes the connection. Servers typically send a ping every 30–60 seconds.

To close a connection cleanly, either side sends a close frame (opcode 0x8) with a 2-byte close code. The other side must reply with its own close frame, then both sides close the TCP connection. This two-step ensures both sides know the session ended intentionally.

Client Server GET /chat Upgrade: websocket Sec-WebSocket-Key: … 101 Switching Protocols Sec-WebSocket-Accept: … TEXT frame {"type":"msg","text":"hello"} TEXT frame {"type":"msg","text":"world"} PING frame (keepalive) PONG frame (must reply immediately) CLOSE frame code=1000 "normal closure" CLOSE frame code=1000 → TCP close
Full WebSocket session: HTTP upgrade, bidirectional text frames, server-initiated ping/pong keepalive, and clean close handshake with code 1000.

How to debug & inspect it

Two tools cover most WebSocket debugging: wscat (a CLI client) and Chrome DevTools (captures every frame in the browser).

# Install wscat (requires Node.js) $ npm install -g wscat # Connect to a WebSocket endpoint and send a message interactively $ wscat -c wss://api.example.com/chat Connected (press CTRL+C to quit) > {"type":"join","room":"general"} < {"type":"ack","room":"general","members":12} > {"type":"msg","text":"hello"} < {"type":"msg","user":"server-bot","text":"welcome!"} # Non-interactive (send one message then disconnect after 3 seconds) $ wscat -c wss://api.example.com/ws -x '{"ping":true}' --wait 3 # With custom headers (e.g. Authorization) $ wscat -c wss://api.example.com/ws -H "Authorization: Bearer token123" # If the server rejects the Upgrade, wscat shows the HTTP response: error: Unexpected server response: 401 ; → auth header missing or token invalid

In Chrome DevTools: open Network → filter by WS → click the connection row → open the Messages tab. You see each frame with direction (↑ = sent, ↓ = received), timestamp, length, and payload. The Headers tab shows the upgrade request/response including Sec-WebSocket-Key and Sec-WebSocket-Accept.

SymptomCauseFix
HTTP 400 or 404 instead of 101Server doesn't handle Upgrade: websocket at that path, or WebSocket not enabled on the routeConfirm the path and that the server has a WebSocket handler registered; check for a trailing slash mismatch
HTTP 401 on the upgrade requestAuth token missing or invalid — auth is checked during the HTTP handshake, before the upgradePass credentials in the upgrade request headers (-H in wscat); cookies work automatically if sent
Connection established then immediately closesServer rejects the first message (protocol mismatch, subprotocol negotiation failure)Check the close frame's code and reason; verify you're sending the expected message format
Close code 1001 (Going Away)Server is shutting down or the page navigated awayImplement reconnect with exponential backoff; drain pending messages before closing
Close code 1006 (Abnormal Closure)TCP connection dropped without a proper close frame — network glitch, server crash, load balancer idle timeoutAdd ping/pong keepalives to keep the connection alive through idle timeouts; reconnect on 1006
Close code 1009 (Message Too Big)Frame payload exceeded server's max frame sizeFragment large messages; increase server's maxPayloadLength if appropriate
Messages appear on wrong connection (fan-out bug)Server is broadcasting to all connections instead of the right room/userVerify room/channel lookup logic; add connection ID logging to trace which client received what

Debug checklist:

  1. Does wscat -c URL connect? If not, the issue is at the HTTP upgrade layer — check the HTTP status code it prints.
  2. In DevTools → Network → WS → Headers: confirm 101 Switching Protocols and that Sec-WebSocket-Accept is present.
  3. In DevTools → Messages: do you see sent/received frames? If the connection opens but no messages arrive, the issue is application-level (routing, room assignment).
  4. Check close frame codes when a connection drops unexpectedly — 1006 (TCP drop) vs 1000 (clean close) tells you whether it was a network problem or intentional.
  5. Add server-side ping/pong (e.g. every 30s) to detect half-open connections — a client that disconnected without a close frame won't be noticed otherwise.

🧠 Quick check

1. The core limitation of plain HTTP that WebSockets address is:

Plain request/response is client-initiated. WebSockets open a persistent two-way channel so the server can push the instant data changes.

2. A WebSocket connection begins as:

It upgrades an existing HTTP connection (Upgrade: websocket → 101), then reuses that TCP connection bidirectionally — which is also why it traverses web-friendly firewalls.

3. You need a one-way live notifications feed (server → client only). The simplest fit is:

SSE is a one-directional server→client stream over plain HTTP — simpler than WebSockets when the client doesn't need to push, and far less wasteful than tight polling.

✍️ Drill: real-time design under scale

Design the real-time layer for a chat app with 2 million concurrent users across many servers. What transport, and what's the non-obvious hard part? Decide first.

Model answer: Use WebSockets (true two-way, low per-message overhead). The hard part isn't the protocol — it's that 2M open connections are spread across many servers, so a message from user A (on server 1) must reach user B (on server 7). You need a pub/sub / message bus: each server subscribes to the channels (rooms) its connected users care about; publishing a message fans it out to whichever servers hold those subscribers. Add heartbeats to detect dead connections and a plan for reconnection/missed-message backfill.

Rubric: ✓ picks WebSockets and justifies two-way ✓ identifies cross-server fan-out as the real challenge ✓ proposes pub/sub ✓ mentions connection limits/heartbeats or reconnection. This is a direct on-ramp to the Pub/Sub case study later.

Key takeaways

Sources & further reading