Foundations · Lesson 09
WebSockets & real-time
Plain HTTP is one-sided: the client asks, the server answers, done. But chat, live scores, and collaborative editing need the server to speak first — to push data the moment it changes. WebSockets keep a single connection open in both directions so it can.
By the end you'll be able to
- Explain why request/response struggles with server-initiated updates.
- Describe how a WebSocket upgrades from HTTP and stays open bidirectionally.
- Choose between polling, SSE, and WebSockets for a given real-time need.
The problem: the server can't start the conversation
In normal HTTP the client always speaks first. So how does a chat app show a message the instant a friend sends it? The naive answer is polling: ask "anything new?" every couple of seconds. It works, but it's wasteful — most requests come back empty, and there's still up-to-2-second lag. Picture phoning the post office every 30 seconds to ask if mail arrived, versus them ringing you when it does.
How a WebSocket starts: the upgrade
A WebSocket begins life as an ordinary HTTP request carrying a special header — Upgrade: websocket. If the server agrees, it replies 101 Switching Protocols and from that point the same TCP connection (Lesson 04) stops speaking HTTP and starts speaking the WebSocket protocol: a long-lived, two-way channel where either side can send a message at any time, with very little per-message overhead.
# client asks to upgrade an existing HTTP connection
GET /chat HTTP/1.1
Host: api.example.com
Upgrade: websocket
Connection: Upgrade
# server agrees — the pipe is now bidirectional
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Because it reuses the HTTP port (443) and handshake, it sails through firewalls and proxies that already allow web traffic — a big reason it won over older hacks.
"How would you build live chat / live notifications?" Lead with the trade-off, not a buzzword: polling is simplest but laggy and wasteful; WebSockets give instant bidirectional push at the cost of holding open connections (which is real server state to manage and scale). Then mention you'd need a way to fan a message out to the right connections across many servers — a pub/sub layer. That arc shows depth.
Three tools, increasing power
| Approach | Direction | Best for | Cost |
|---|---|---|---|
| Polling | Client pulls repeatedly | Rare updates, dead-simple needs | Wasted requests, lag |
| SSE (Server-Sent Events) | Server → client only (one-way stream) | Live feeds, notifications, dashboards | One-directional; text only |
| WebSocket | Both directions, anytime | Chat, multiplayer, collaborative editing | Stateful open connections to manage |
The instinct: don't reach for WebSockets if you only need server→client updates — SSE is simpler and rides plain HTTP. Use WebSockets when the client must also send frequently and instantly (typing, game moves, cursor positions).
Forgetting that open connections are state. A million chat users means a million live connections pinned to your servers (recall the file-descriptor limits from Lesson 04). You can't just put a plain stateless load balancer in front and call it done — you need sticky routing or a shared pub/sub bus so a message published on one server reaches a user connected to another. "Just use WebSockets" without this is an incomplete answer.
Do match the tool to direction and frequency: polling for rare, SSE for one-way streams, WebSockets for true two-way. Don't default to WebSockets for a notifications feed — you'll take on connection-management complexity you didn't need.
Under the hood: how it actually works
The Upgrade handshake — exact bytes
A WebSocket starts as a perfectly ordinary HTTP/1.1 request. The client sends four special headers; the server's 101 response is the signal that the TCP connection has been handed off from HTTP to the WebSocket protocol.
## Client → Server (the HTTP Upgrade request)
GET /chat HTTP/1.1\r\n
Host: api.example.com\r\n
Upgrade: websocket\r\n
Connection: Upgrade\r\n
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==\r\n
Sec-WebSocket-Version: 13\r\n
\r\n
## Server → Client (101 = "I'm switching protocols")
HTTP/1.1 101 Switching Protocols\r\n
Upgrade: websocket\r\n
Connection: Upgrade\r\n
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=\r\n
\r\n
# After this blank line: no more HTTP — pure WebSocket frames both ways
The Sec-WebSocket-Accept value is not a secret — it is derived from the client's key using a fixed GUID. The server concatenates the key with the magic GUID "258EAFA5-E914-47DA-95CA-C5AB0DC85B11", takes SHA-1 of the result, then Base64-encodes it:
# How Sec-WebSocket-Accept is computed (pseudocode)
key = "dGhlIHNhbXBsZSBub25jZQ==" # from client header
magic = "258EAFA5-E914-47DA-95CA-C5AB0DC85B11"
accept = base64( sha1( key + magic ) )
# = "s3pPLMBiTxaQ9kYGzzhZRbK+xOo="
# Verify it yourself:
$ echo -n "dGhlIHNhbXBsZSBub25jZQ==258EAFA5-E914-47DA-95CA-C5AB0DC85B11" \
| openssl sha1 -binary | base64
s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
The client validates this value before trusting the upgrade. The purpose is to prove the server intended a WebSocket upgrade (not a misrouted HTTP cache response accidentally being reused as a WebSocket).
The WebSocket frame format
After the handshake, data is exchanged as frames — binary structures with a compact 2–14 byte header:
## WebSocket frame layout (RFC 6455 §5.2)
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len | Extended payload length |
|I|S|S|S| (4) |A| (7) | (16/64) |
|N|V|V|V| |S| | |
| |1|2|3| |K| | |
+-+-+-+-+-------+-+-------------+-------------------------------+
| Masking-key (if MASK=1, 32 bits) |
+---------------------------------------------------------------+
| Payload data |
+---------------------------------------------------------------+
## Key fields:
FIN (1 bit) — 1 = this is the final fragment of the message
opcode (4 bits):
0x0 = continuation frame
0x1 = text frame (UTF-8)
0x2 = binary frame
0x8 = close frame (+ 2-byte close code + optional UTF-8 reason)
0x9 = ping frame
0xA = pong frame
MASK (1 bit) — 1 = payload is XOR-masked (clients MUST mask; servers MUST NOT)
Payload len — 0–125: actual length; 126: next 2 bytes = length; 127: next 8 bytes
The masking requirement is security-related: browsers must mask frames they send so a malicious page can't craft a WebSocket message that looks like an HTTP request to a proxy — a cache-poisoning attack. Server-sent frames are never masked.
Ping / pong and the close handshake
Either side can send a ping (opcode 0x9); the receiver must immediately reply with a pong (opcode 0xA) echoing the payload. This detects dead connections: if pings go unanswered, the side that sent them closes the connection. Servers typically send a ping every 30–60 seconds.
To close a connection cleanly, either side sends a close frame (opcode 0x8) with a 2-byte close code. The other side must reply with its own close frame, then both sides close the TCP connection. This two-step ensures both sides know the session ended intentionally.
How to debug & inspect it
Two tools cover most WebSocket debugging: wscat (a CLI client) and Chrome DevTools (captures every frame in the browser).
In Chrome DevTools: open Network → filter by WS → click the connection row → open the Messages tab. You see each frame with direction (↑ = sent, ↓ = received), timestamp, length, and payload. The Headers tab shows the upgrade request/response including Sec-WebSocket-Key and Sec-WebSocket-Accept.
| Symptom | Cause | Fix |
|---|---|---|
| HTTP 400 or 404 instead of 101 | Server doesn't handle Upgrade: websocket at that path, or WebSocket not enabled on the route | Confirm the path and that the server has a WebSocket handler registered; check for a trailing slash mismatch |
| HTTP 401 on the upgrade request | Auth token missing or invalid — auth is checked during the HTTP handshake, before the upgrade | Pass credentials in the upgrade request headers (-H in wscat); cookies work automatically if sent |
| Connection established then immediately closes | Server rejects the first message (protocol mismatch, subprotocol negotiation failure) | Check the close frame's code and reason; verify you're sending the expected message format |
| Close code 1001 (Going Away) | Server is shutting down or the page navigated away | Implement reconnect with exponential backoff; drain pending messages before closing |
| Close code 1006 (Abnormal Closure) | TCP connection dropped without a proper close frame — network glitch, server crash, load balancer idle timeout | Add ping/pong keepalives to keep the connection alive through idle timeouts; reconnect on 1006 |
| Close code 1009 (Message Too Big) | Frame payload exceeded server's max frame size | Fragment large messages; increase server's maxPayloadLength if appropriate |
| Messages appear on wrong connection (fan-out bug) | Server is broadcasting to all connections instead of the right room/user | Verify room/channel lookup logic; add connection ID logging to trace which client received what |
Debug checklist:
- Does
wscat -c URLconnect? If not, the issue is at the HTTP upgrade layer — check the HTTP status code it prints. - In DevTools → Network → WS → Headers: confirm
101 Switching Protocolsand thatSec-WebSocket-Acceptis present. - In DevTools → Messages: do you see sent/received frames? If the connection opens but no messages arrive, the issue is application-level (routing, room assignment).
- Check close frame codes when a connection drops unexpectedly —
1006(TCP drop) vs1000(clean close) tells you whether it was a network problem or intentional. - Add server-side ping/pong (e.g. every 30s) to detect half-open connections — a client that disconnected without a close frame won't be noticed otherwise.
🧠 Quick check
1. The core limitation of plain HTTP that WebSockets address is:
Plain request/response is client-initiated. WebSockets open a persistent two-way channel so the server can push the instant data changes.
2. A WebSocket connection begins as:
It upgrades an existing HTTP connection (Upgrade: websocket → 101), then reuses that TCP connection bidirectionally — which is also why it traverses web-friendly firewalls.
3. You need a one-way live notifications feed (server → client only). The simplest fit is:
SSE is a one-directional server→client stream over plain HTTP — simpler than WebSockets when the client doesn't need to push, and far less wasteful than tight polling.
✍️ Drill: real-time design under scale
Design the real-time layer for a chat app with 2 million concurrent users across many servers. What transport, and what's the non-obvious hard part? Decide first.
Model answer: Use WebSockets (true two-way, low per-message overhead). The hard part isn't the protocol — it's that 2M open connections are spread across many servers, so a message from user A (on server 1) must reach user B (on server 7). You need a pub/sub / message bus: each server subscribes to the channels (rooms) its connected users care about; publishing a message fans it out to whichever servers hold those subscribers. Add heartbeats to detect dead connections and a plan for reconnection/missed-message backfill.
Rubric: ✓ picks WebSockets and justifies two-way ✓ identifies cross-server fan-out as the real challenge ✓ proposes pub/sub ✓ mentions connection limits/heartbeats or reconnection. This is a direct on-ramp to the Pub/Sub case study later.
Key takeaways
- Plain HTTP is client-initiated; real-time needs the server to push.
- WebSockets upgrade an HTTP connection (
101) into a persistent two-way channel. - Pick by direction/frequency: polling (rare) → SSE (one-way stream) → WebSockets (full duplex).
- Open connections are state: scaling them needs sticky routing + a pub/sub fan-out layer.