API Design

Foundations · Lesson 05

Network sockets

Under every HTTP request is something humbler: a socket — the actual two-way pipe between two machines. Understanding it explains why connections cost something, why "keep-alive" matters, and what a server really means when it says "too many open connections."

⏱ 11 minDifficulty: corePrereq: Lesson 02

By the end you'll be able to

An address plus a door number

To reach a specific program on a specific machine you need two things. The IP address finds the machine — like a building's street address. The port finds the program on it — like an apartment number. A web server typically listens on port 443 (HTTPS); a database might listen on 5432. The pair IP : port is a socket — one specific endpoint of a connection.

A live connection is actually two sockets joined: (your-IP:random-port) ↔ (server-IP:443). That's how one server on port 443 can hold thousands of simultaneous connections — each client uses a different port on its own side, so every pair is unique.

Client 203.0.113.5 : 51000 Server 198.51.100.9 : 443 one TCP connection = the two sockets joined
Each side has its own IP:port. The unique pair is what lets one port serve thousands of clients at once.

TCP: the reliable handshake

Most APIs run over TCP, which guarantees your bytes arrive in order and intact. To earn that guarantee, TCP opens with a three-way handshake — a quick "can you hear me?" exchange before any real data flows:

Client Server 1. SYN ("let's talk?") 2. SYN-ACK ("sure, you?") 3. ACK ("yes — go") now the actual request flows
Three messages before any data — roughly one network round trip of pure setup. Over HTTPS, the TLS handshake adds more (Lesson on TLS later).

That setup is "free" on a fast local network but expensive across the world: a 150 ms round trip means ~150 ms before your request even starts. Now you can see why reopening a connection per request — HTTP/1.0's flaw from Lesson 02 — was so costly.

🎯 Interview angle

"Why is the first request to a service slower than the rest?" Because the first pays for connection setup — TCP handshake plus (over HTTPS) the TLS handshake. Later requests on a kept-alive connection skip all of it. This is also why connection pooling matters server-to-server: you amortise the handshake across thousands of calls.

TCP vs UDP, in one breath

TCPUDP
GuaranteesIn-order, no loss, with handshakeBest-effort, may drop/reorder, no handshake
CostSetup round trip + bookkeepingAlmost none — fire and forget
Use it forAPIs, web pages, anything that must be exactLive video/voice, games, DNS — where fresh beats complete

Most APIs want TCP's reliability. UDP shines when a late packet is worthless anyway (a video frame from two seconds ago helps no one). Recall from Lesson 02 that HTTP/3's QUIC is built on UDP and re-adds reliability itself to dodge TCP's head-of-line blocking — a deliberate "rebuild it our way" choice.

⚠️ Common trap

Treating connections as free and unlimited. Each open socket consumes server memory and a file descriptor; OSes cap how many you can have. "Too many open connections" / "running out of file descriptors" is a real outage cause — often from clients that never reuse or close connections. Connection limits are a capacity dimension, just like CPU and RAM.

✅ Do this, not that

Do reuse connections (keep-alive, pooling) so the handshake is paid once. Don't open a fresh connection per call in a tight loop — you'll drown in setup latency and may exhaust ports/descriptors on both ends.

Under the hood: the actual socket syscalls and the SYN/SYN-ACK/ACK packets

A "socket" is exposed to your program through a small set of OS system calls. The sequence differs between the server and the client, and understanding it explains every error message you will ever see — from "connection refused" to "address already in use."

Server-side syscall sequence:

// 1. socket() — allocate a socket; returns a file descriptor (fd)
int fd = socket(AF_INET, SOCK_STREAM, 0);
//   AF_INET = IPv4, SOCK_STREAM = TCP  (SOCK_DGRAM for UDP)
//   fd is now just a number — no address, no port yet

// 2. bind() — assign a local IP:port to this socket
bind(fd, {.sin_addr=INADDR_ANY, .sin_port=htons(443)}, sizeof(addr));
//   INADDR_ANY = listen on all interfaces; htons converts host→network byte order
//   "address already in use" (EADDRINUSE) fires here if port 443 is taken

// 3. listen() — mark the socket as passive (willing to accept)
listen(fd, 128);
//   128 = backlog — max SYNs queued before the kernel drops them
//   "SYN flood" attacks saturate this queue

// 4. accept() — block until a client connects; returns a NEW fd for that connection
int client_fd = accept(fd, &client_addr, &client_len);
//   fd is reused for the next accept(); client_fd is the actual pipe to this client
//   each accepted connection consumes one file descriptor from the OS limit

// 5. send() / recv() — exchange data
recv(client_fd, buf, sizeof(buf), 0);  // read the HTTP request
send(client_fd, response, len, 0);     // write the HTTP response

// 6. close() — release the file descriptor; triggers FIN teardown
close(client_fd);

Client-side syscall sequence:

// 1. socket() — same as server: allocate an fd
int fd = socket(AF_INET, SOCK_STREAM, 0);
//   client does NOT call bind(); the OS picks a free ephemeral port (49152-65535)

// 2. connect() — initiate the TCP three-way handshake
connect(fd, {.sin_addr=server_ip, .sin_port=htons(443)}, sizeof(addr));
//   kernel sends SYN; blocks until SYN-ACK received; kernel sends ACK; returns
//   "connection refused" (ECONNREFUSED) = server sent RST (nothing listening)
//   "connection timed out" (ETIMEDOUT) = no reply at all (firewall drops SYN)

// 3. send() / recv() — same as server side
send(fd, request, len, 0);
recv(fd, buf, sizeof(buf), 0);

// 4. close() — sends FIN; server ACKs; server FIN; client ACKs (4-way teardown)
close(fd);

The SYN/SYN-ACK/ACK exchange at the packet level — here is what tcpdump actually shows during a handshake to port 443:

$ tcpdump -i eth0 -n "host 93.184.216.34 and port 443" 12:00:00.000000 IP 203.0.113.5.51000 > 93.184.216.34.443: Flags [S], seq 1234567890 # Flags [S] = SYN — client proposes initial sequence number 12:00:00.049832 IP 93.184.216.34.443 > 203.0.113.5.51000: Flags [S.], seq 9876543210, ack 1234567891 # Flags [S.] = SYN+ACK — server proposes its seq, ACKs client's (seq+1) 12:00:00.049901 IP 203.0.113.5.51000 > 93.184.216.34.443: Flags [.], ack 9876543211 # Flags [.] = ACK — handshake complete; first HTTP bytes follow immediately 12:00:00.050100 IP 203.0.113.5.51000 > 93.184.216.34.443: Flags [P.], seq 1:220, ack 1 # Flags [P.] = PSH+ACK — HTTP request data pushed to server

The sequence numbers (seq/ack) are how TCP detects lost packets and reassembles segments in order. The client's initial seq is random (prevents old packets from prior connections being mistaken for new ones). Each ACK is next expected byte = other side's seq + 1.

How to debug & inspect it

Three tools cover almost every socket-level problem you will encounter: ss (socket statistics — the modern replacement for netstat), lsof (which process owns which fd), and tcpdump (the raw packet view).

# List all listening TCP sockets with the PID that owns them $ ss -lntp State Recv-Q Send-Q Local Address:Port Peer Address:Port Process LISTEN 0 128 0.0.0.0:443 0.0.0.0:* users:(("nginx",pid=1234,fd=6)) # List ESTABLISHED connections to port 443 — how many live connections? $ ss -ntp state established '( dport = :443 or sport = :443 )' Recv-Q Send-Q Local Peer Process 0 0 10.0.0.5:51000 93.184.216.34:443 users:(("curl",pid=9999,fd=5)) # How many open sockets does process 1234 have? $ lsof -p 1234 | grep -c sock 4821 # If this approaches the OS limit (ulimit -n), you will see "too many open files" # Check the current fd limit for a process $ cat /proc/1234/limits | grep "open files" Max open files 65536 65536 files # Watch connection state distribution in real time $ watch -n1 'ss -nt | awk "{print \$1}" | sort | uniq -c | sort -rn' 4821 ESTABLISHED 103 TIME_WAIT 2 CLOSE_WAIT

Connection error symptom-to-cause-to-fix table:

Error / symptomWhat it means at the syscall levelCauseFix
Connection refused (ECONNREFUSED)Server sent a TCP RST in response to the SYNNothing is listening on that port (listen() was never called, or the process crashed)Confirm the service is running (ss -lntp); check the port number; check for a recent crash
Connection timed out (ETIMEDOUT)connect() sent SYN but never received SYN-ACKA firewall silently drops the SYN; the host is unreachable; wrong IPUse traceroute to find the hop that drops; check security group / iptables rules
Connection reset by peer (ECONNRESET)Server sent RST during an established connectionServer process crashed mid-request; load balancer killed an idle connection; server sent RST on closeCheck server logs for crashes; tune keep-alive idle timeout to be less than the LB's timeout; implement retry logic
Too many open files (EMFILE / ENFILE)accept() or socket() failed because the fd limit was hitConnection leak (sockets not closed after use); fd limit too low; connection stormFix the leak (ensure every socket() is eventually close()d); raise ulimit -n; add connection pooling
Large TIME_WAIT countSockets in the 2-MSL wait after active-close (normal, but can exhaust ephemeral ports)Server is the active-closer (sends FIN first) at high connection rate; short-lived connections without keep-aliveEnable SO_REUSEADDR; use keep-alive to amortise connections; enable net.ipv4.tcp_tw_reuse on Linux
High Recv-Q in ss -lntpData arrived faster than accept() consumes it — the backlog is fillingApplication too slow to call accept(); listen backlog too small; CPU-bound accept loopIncrease listen backlog; use multiple worker threads/processes; profile the accept loop

Debug checklist:

  1. Start with ss -lntp — confirm the service is listening on the expected port and interface.
  2. If "connection refused": the service is not listening. Check if the process is running; check the port in config.
  3. If "timed out": the port is being filtered. Use traceroute and check firewall rules.
  4. If "too many open files": run lsof -p <pid> | wc -l to count fds; check for socket leaks with ss -ntp | grep <pid>.
  5. For a detailed handshake trace: tcpdump -i any -n "host <ip> and port <port>" — look for RST (refused/reset) or lone SYNs with no reply (filtered).
  6. High TIME_WAIT count is usually benign but can exhaust ephemeral ports; check with ss -nt state time-wait | wc -l.

By the numbers

Concrete scenario: a Node.js service calls an upstream REST API. Round-trip time (RTT) between the two hosts is 50 ms. TLS 1.3 is in use. The service makes 2,000 req/s at peak with an average upstream latency of 50 ms per call (once connected).

What a new connection actually costs

Opening a fresh HTTPS connection requires two sequential handshakes before a single byte of application data can flow:

new_conn_cost = TCP_handshake + TLS_handshake = 1 RTT + 1 RTT (TLS 1.3) = 50 ms + 50 ms = 100 ms of pure setup overhead # TLS 1.2 costs 2 RTTs for TLS (200 ms total). # TLS 1.3 reduced this to 1 RTT by merging key exchange into the first flight. # TLS 1.3 0-RTT resumption can cut it to ~0 ms on reconnects (with replay caveats).

(RFC 8446 §2 — TLS 1.3 handshake overview)

Without pooling: 1,000 sequential calls pay setup 1,000 times

sequential_no_keepalive: total_setup_time = 1,000 calls × 100 ms/conn = 100,000 ms = 100 s total_work_time = 1,000 calls × 50 ms/call = 50,000 ms = 50 s grand_total = 150,000 ms = 150 s sequential_with_one_pooled_conn: total_setup_time = 1 conn × 100 ms = 100 ms total_work_time = 1,000 calls × 50 ms/call = 50,000 ms = 50 s grand_total = 50,100 ms ≈ 50 s # Setup overhead falls from 67% of total time to 0.2% of total time.

Connection reuse turns 150 seconds of wall time into 50 seconds — a 3× speedup with zero code change to the business logic.

Pool sizing via Little's Law

How many connections should the pool hold? Exactly the number in flight at peak — which is Little's Law (L = λ · W) applied to the upstream call:

pool_size = peak_QPS × avg_latency_seconds = 2,000 req/s × 0.050 s = 100 connections # If the pool is smaller than 100, callers queue for a free connection, # adding latency on top of the 50 ms upstream call.

This is the canonical formula used by every connection-pool library (HikariCP, pg, pgx, SQLAlchemy pool). (HikariCP — About Pool Sizing)

Worked connection-lifecycle trace

Service starts cold at 08:00:00. Pool max = 100 connections, RTT = 50 ms, upstream latency = 50 ms.

TimeEventPool stateSetup cost paid?Caller waits
08:00:00.000Request #1 arrives — pool empty0/100 openYes — TCP + TLS = 100 ms150 ms (100 setup + 50 work)
08:00:00.100Requests #2–50 arrive during warm-up1–50/100 openYes, once each150 ms first use, 50 ms reuse
08:00:01.000Steady state — pool full (100 conns)100/100 openNo — reusing50 ms (work only)
08:01:00.000Traffic spike: 3,000 req/s for 5 s100/100, queue depth growsNo new conns (at max)50 ms + queue wait
08:01:05.000Spike ends — queue drains100/100, queue = 0No50 ms
08:02:00.000Idle keepalive timeout (e.g. 60 s)Conns closingYes on next request150 ms cold again

Decision math — pool size and the "too many open files" ceiling

The pool has a hard upper bound: the OS file-descriptor limit per process. A typical Linux default is 65,536 fds per process. Each open socket consumes one fd. A service holding connections to K upstream services with P connections each uses K × P fds just for connection pools:

# Break-even: when does adding pool connections stop helping? # Answer: once pool_size ≥ peak_concurrency (Little's Law), extra slots # sit idle and waste fds without reducing wait time. # Ceiling check: fds_for_pools = upstreams × pool_size = 10 × 100 = 1,000 fds fds_for_other = files, logs, sockets ≈ 500 fds total_fds_used ≈ 1,500 fds # well under 65,536 # "Too many open files" appears when: # peak_concurrency × upstreams + other_fds > ulimit -n # Fix: raise ulimit (sysctl), or reduce pool_max, or add more instances.

The decision rule: set pool_max = ceil(peak_QPS × avg_latency_s) × 1.25 (25% headroom), verify it fits within the fd ceiling, and set keepalive-idle timeout to be shorter than any upstream or load-balancer timeout (typically 55 s when the upstream has a 60 s idle timeout — a 5-second margin prevents surprise RSTs). (RFC 9293 §3.3.2 — TCP keepalive)

🧠 Quick check

1. A socket is best described as:

A socket is the IP:port endpoint. A connection joins two of them; the unique pair lets one port serve many clients.

2. The TCP three-way handshake mainly adds:

SYN / SYN-ACK / ACK is ~one round trip of pure setup. Encryption is TLS's job; header compression is an HTTP/2 feature.

3. For a live voice call, which transport is usually the better fit and why?

For real-time media, freshness beats completeness — retransmitting a two-second-old audio packet just adds delay. UDP's fire-and-forget suits it.

✍️ Drill: explain a slow cold start

Your service's first call after deploy takes 350 ms; subsequent calls take 40 ms, all to the same region. No code changed between calls. Explain the gap. Decide before opening.

Model answer: The first call pays one-time setup the others reuse: TCP handshake + TLS handshake to establish the connection (and possibly DNS resolution and lazy resource/JIT warm-up). Once a keep-alive connection and caches are warm, later calls skip all of it, dropping to ~40 ms. Fix/mitigate: connection pooling, pre-warming, and keeping connections alive.

Rubric: ✓ attributes it to one-time per-connection setup ✓ names handshake(s) ✓ proposes keep-alive/pooling/warm-up. Bonus: distinguishes connection setup from app-level cold start.

Key takeaways

Sources & further reading