API Design

Foundations · Lesson 08

Remote procedure calls (RPC)

RPC is a style of API with one goal: make calling a function on another machine feel exactly like calling one in your own code — user = getUser(42), no URLs in sight. It's a powerful illusion. Knowing where the illusion leaks is what separates a working system from a fragile one.

⏱ 11 minDifficulty: corePrereq: Lessons 02, 04

By the end you'll be able to

The illusion: a local call that isn't

In ordinary code you call add(2, 3) and get 5. RPC lets you write what looks like that, but the function runs on a server somewhere else. The machinery that sells the illusion is a pair of stubs — small auto-generated shims on each side:

Client machine your code: getUser(42) client stub (marshal) Server machine server stub (unmarshal) real getUser() runs bytes over the network result marshalled back
The client stub packs the arguments into bytes (marshalling), ships them, the server stub unpacks them, runs the real function, and sends the result back the same way.

Marshalling (a.k.a. serialization) is turning in-memory values into a byte stream the network can carry; unmarshalling rebuilds them on the other side. The caller writes a normal-looking function call and never touches sockets — that's the whole appeal.

Where the illusion leaks

A local function call is fast, always returns, and can't half-happen. A remote one can't promise any of that. These are the leaks every RPC system must face:

Local callRemote call (the leak)
NanosecondsMilliseconds — the network round trip dominates (Lesson 03)
Always returnsCan time out — and you may not know if it ran
Can't partially failRequest can arrive but the reply get lost → did it happen?
Same memory & typesDifferent machines/languages → must agree on a wire format
⚠️ Common trap

Treating remote calls as if they were local — sprinkling them inside loops or trusting they "just return." A loop of 1,000 RPCs is 1,000 network round trips. And because a timeout leaves you unsure whether the call ran, blind retries can double-charge a card or double-send a message. The fix is idempotency (its own lesson) so a retry is safe — design for it from the start.

🎯 Interview angle

This is the famous "fallacies of distributed computing" territory. If you can name even three leaks — the network isn't instant, calls can fail in the middle, and you can't tell a lost request from a lost reply — and then say "so I'd make the operation idempotent and set explicit timeouts," you sound like someone who's actually run services in production.

gRPC: the modern, fast RPC

The most common RPC framework today is gRPC. Its two big ideas:

// a tiny .proto contract — the source of truth for both sides
service Users {
  rpc GetUser (GetUserRequest) returns (User);
}
message GetUserRequest { int64 id = 1; }
message User { int64 id = 1; string name = 2; }
✅ When to reach for RPC vs REST

RPC/gRPC shines inside your system — service-to-service calls where you control both ends, want low latency, compact payloads, and streaming (e.g. a checkout service calling an inventory service). REST (next module) shines at the edge — public, browser-friendly, cacheable APIs where ubiquity and human-readability matter more than raw speed. Many real systems use both: gRPC between internal services, REST/JSON to the outside world.

Under the hood: how it actually works

Two mechanisms make gRPC work: protobuf wire encoding (how arguments become bytes) and HTTP/2 framing (how those bytes travel and return).

Protobuf wire encoding

Protobuf uses a compact binary format. Every field in a message is identified by a small integer field tag (not by name — that's why field numbers must never change). Each encoded value is a tag-wire-type varint followed by the value. For the proto definition:

message GetUserRequest { int64 id = 1; }
// Encoding GetUserRequest { id: 42 }:
//   field 1, wire type 0 (varint) → tag byte = (1 << 3) | 0 = 0x08
//   value 42 encoded as varint → 0x2A
//   total: 2 bytes  [0x08, 0x2A]

message User { int64 id = 1; string name = 2; }
// Encoding User { id: 42, name: "Ada" }:
//   field 1 varint 42   → 0x08 0x2A
//   field 2 length-delimited "Ada" → tag = (2<<3)|2 = 0x12,
//                                     length = 0x03, bytes = 0x41 0x64 0x61
//   total: 7 bytes vs ~22 bytes for {"id":42,"name":"Ada"} in JSON

The field name is never in the wire format — only the number. This makes the binary compact and fast to parse (no string comparisons), but it means a schema (.proto file) is required to decode arbitrary messages.

How a gRPC call maps onto HTTP/2 frames

A unary gRPC call (GetUser) runs entirely inside a single HTTP/2 stream. Here is the exact frame sequence:

Client Server ────────────────────────────────────────────────────── HEADERS frame stream_id=1 :method = POST :path = /users.Users/GetUser :scheme = https content-type = application/grpc te = trailers DATA frame stream_id=1 (5-byte grpc framing + protobuf body) [compressed_flag=0][message_length=2][0x08 0x2A] ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ HEADERS frame stream_id=1 :status = 200 content-type = application/grpc DATA frame stream_id=1 [0][7 bytes = User{id:42,name:"Ada"}] HEADERS frame (trailers) END_STREAM grpc-status = 0 ← 0 = OK grpc-message = ""

The gRPC-specific layer sits inside HTTP/2 DATA frames: a 5-byte framing prefix ([1-byte compressed flag][4-byte message length]) wraps each protobuf message. The grpc-status code arrives in HTTP/2 trailers (headers sent after the body), not in the initial response headers — this is why the client must read the full stream to know if the call succeeded.

Client Server HEADERS :method=POST :path=/…/GetUser DATA [0][len=2][0x08 0x2A] 5-byte gRPC prefix + protobuf HEADERS :status=200 content-type=grpc DATA [0][7][User{id:42,name:"Ada"}] TRAILERS grpc-status=0 END_STREAM
A unary gRPC call is one HTTP/2 stream. The client sends HEADERS + DATA; the server responds with HEADERS + DATA then closes the stream with trailers carrying grpc-status.

How to debug & inspect it

gRPC's binary format can't be read with plain curl, but grpcurl (a curl-equivalent for gRPC) makes it easy — especially with server reflection enabled.

# List all services exposed by the server (requires server reflection) $ grpcurl -plaintext localhost:50051 list users.Users grpc.reflection.v1alpha.ServerReflection # Describe a service's methods and message types $ grpcurl -plaintext localhost:50051 describe users.Users users.Users is a service: service Users { rpc GetUser ( .users.GetUserRequest ) returns ( .users.User ); } # Make a call (pass request as JSON — grpcurl handles proto encoding) $ grpcurl -plaintext -d '{"id": 42}' localhost:50051 users.Users/GetUser { "id": "42", "name": "Ada Lovelace" } # With TLS (production servers) $ grpcurl -d '{"id": 42}' api.example.com:443 users.Users/GetUser # If server reflection is disabled, provide the .proto file directly $ grpcurl -proto users.proto -d '{"id": 42}' localhost:50051 users.Users/GetUser # Verbose mode shows HTTP/2 frames and gRPC trailers $ grpcurl -v -d '{"id": 42}' localhost:50051 users.Users/GetUser Resolved method descriptor: Response headers received: content-type: application/grpc Response trailers received: grpc-status: 0 Sent 1 request and received 1 response
Symptom / grpc-statusCode meaningLikely causeFix
grpc-status: 0OK
grpc-status: 1 CANCELLEDClient cancelled the callClient timeout elapsed or explicit cancelCheck client-side deadline; increase timeout if legitimate
grpc-status: 2 UNKNOWNServer threw an unhandled exceptionUnhandled panic/exception in handler; see server logsAdd proper error handling; return a typed gRPC status
grpc-status: 4 DEADLINE_EXCEEDEDServer didn't respond within the deadlineSlow handler, downstream dependency slow, network partitionCheck server logs for slow queries; add tracing
grpc-status: 5 NOT_FOUNDEntity not foundResource doesn't existValidate input before calling; handle gracefully
grpc-status: 7 PERMISSION_DENIEDAuthZ check failedMissing or insufficient credentialsCheck token scopes; verify interceptor/auth middleware
grpc-status: 12 UNIMPLEMENTEDMethod not implemented on serverMethod name mismatch, server not deployed yet, wrong portConfirm service name + method name exactly match the .proto; check the correct server is running
grpc-status: 14 UNAVAILABLEServer is down or unreachableConnection refused, network partition, server crashedCheck server health; implement retry with backoff for transient failures

Debug checklist:

  1. Enable server reflection in development — it lets grpcurl list and describe work without needing the .proto file.
  2. Read grpc-status in trailers first — it's the equivalent of the HTTP status code for gRPC calls.
  3. Use grpcurl -v to see raw headers and trailers including grpc-message (human-readable error detail).
  4. If you can't connect at all, check that the server is listening on the correct port with grpc+http2 (not plain HTTP/1.1).
  5. For "works locally, fails in prod" — check TLS: production gRPC servers require a valid cert; use grpcurl without -plaintext for TLS.
⚠️ Never change field numbers in a .proto

Because protobuf uses field numbers (not names) on the wire, renaming a field is safe — but changing its number or removing it without marking it reserved silently corrupts data. Old clients will misinterpret bytes when talking to a server compiled from the new schema. Always mark deleted fields reserved and never reuse numbers.

🧠 Quick check

1. In RPC, "marshalling" refers to:

Marshalling/serialization packs arguments into a byte stream; the receiver unmarshals them back into values. Encryption and load balancing are separate concerns.

2. Why is blindly retrying a timed-out RPC dangerous?

A timeout doesn't tell you whether the call ran. Without idempotency, retrying can charge a card twice. Idempotency makes the retry safe.

3. Two reasons gRPC is fast for internal service calls:

protobuf is smaller and quicker to parse than JSON text, and HTTP/2 lets many calls share one connection with streaming. gRPC is explicitly multi-language.

✍️ Drill: pick a protocol for two call sites

(a) A public mobile app fetches a user's profile. (b) Your order service calls your inventory service 50× per checkout. Which style for each, and why? Decide first.

Model answer: (a) REST/JSON over HTTPS — public, diverse clients, cacheable, human-debuggable; ubiquity matters more than microseconds. (b) gRPC — internal, you own both ends, high call volume, want compact binary payloads and low latency; streaming helps if inventory checks batch. Bonus: make the inventory deduction idempotent so a retried call can't double-decrement stock.

Rubric: ✓ REST at the public edge ✓ gRPC internally ✓ justifies by control/volume/latency ✓ remembers idempotency on the retried internal call.

Key takeaways

Sources & further reading