API Design

Architectural Styles · Lesson 06

Comparing REST, GraphQL, and gRPC

You've seen each style in isolation. Now lay them side by side and build the decision instinct you need for the real question every architect faces: which style goes here, and why?

⏱ 15 min Difficulty: core Prereq: as-01 through as-05

By the end you'll be able to

The decision table

Each cell below is a short verdict — read it as "how well does this style handle this dimension?" rather than a strict score. No style wins every dimension.

Dimension REST GraphQL gRPC
Typical clients Browsers, third-party apps, CLIs First-party mobile/web apps needing tailored payloads Internal services, microservice mesh
Performance Good — JSON/HTTP/1.1 or HTTP/2, well-understood Good — one round-trip; overhead in query parsing & resolver orchestration Excellent — binary protobuf + HTTP/2 multiplexing
Caching Built-in via HTTP (CDN, ETag, Cache-Control) Manual — persisted queries, client normalised cache, schema hints Not built-in — must implement at the application layer
Streaming Server-Sent Events or WebSocket (add-on) Subscriptions (WebSocket); well-supported in major clients First-class — 4 streaming modes built into the protocol
Type contract Optional (OpenAPI, JSON Schema) Mandatory — SDL schema is the contract Mandatory — .proto file; compile-time enforcement
Tooling Universal — curl, Postman, every HTTP library GraphiQL, Apollo Studio, Relay — rich but specialised grpcurl, Buf, generated stubs — capable but niche
Debuggability Excellent — readable JSON in any browser DevTools Good — JSON responses; errors embedded in body Difficult — binary wire format; needs grpcurl + proto file
Learning curve Low — built on HTTP vocabulary most devs know Medium — SDL, resolvers, N+1 pitfalls Medium-high — protobuf, codegen, HTTP/2 mental model
Versioning URL versioning (/v1/) or header; well-established patterns Additive schema evolution; deprecation directives Additive field evolution; reserved field numbers for removals
Browser support Native Native (over HTTP/1.1) Requires grpc-web proxy

"Use X when…" guidance

Use REST when

Use GraphQL when

Use gRPC when

Who are the clients? Start here External / third-party First-party Internal services REST Stable contract, cacheable Need field flexibility? Different fields per client? No Yes REST Stable shapes, simpler GraphQL Client-declared fields gRPC Binary, typed, high perf These aren't mutually exclusive — many products use all three at different layers.
A simplified decision path. Real decisions involve more dimensions (team experience, existing infrastructure, browser reach) — this is a starting heuristic, not a rule.

Coexistence — a realistic architecture

Mature products typically use all three styles, each at the layer it fits best:

Scenario

A fintech platform's API surface might look like this:

# Public developer API — REST
# Partners integrate via stable, versioned, curl-friendly endpoints
api.fintech.com/v2/payments
api.fintech.com/v2/accounts
api.fintech.com/v2/webhooks

# Mobile / web BFF — GraphQL
# iOS and Android declare exactly what each screen needs
bff.fintech.com/graphql

# Internal service mesh — gRPC
# Risk engine, fraud scorer, ledger — high-frequency internal calls
risk.internal:50051      # rpc Score(Transaction) returns (RiskResult)
ledger.internal:50052    # rpc Debit/Credit streaming
fraud.internal:50053     # bidi streaming for real-time signals

The styles don't compete — they occupy distinct tiers. The REST API is the public face; GraphQL is the BFF (Backend for Frontend) that absorbs client churn; gRPC is the performance backbone hidden from external callers entirely.

🎯 Interview angle

"Pick a style for each of these three scenarios: (a) a public payments API used by hundreds of partners, (b) a mobile app with dozens of heterogeneous screens, (c) a cluster of internal microservices exchanging 50,000 messages per second."

Strong answer: (a) REST — partner-friendly, cacheable, stable contracts; (b) GraphQL — each screen declares its fields, no endpoint proliferation; (c) gRPC — binary efficiency + HTTP/2 + compile-time contract enforcement across languages. Bonus: mention that (a) and (b) could be the same company using both styles at the same time.

⚠️ Common trap — picking one style for everything

The biggest architectural mistake is cargo-culting a style across the entire organisation. "We're a GraphQL shop" leads to: public APIs that are hard for partners to curl, internal services that parse JSON when they could use binary, and browser clients that can't use a gRPC-only BFF. Each boundary has different constraints — honour them.

✅ Do this, not that

Do document the style decision for each interface boundary in an Architecture Decision Record (ADR) — it forces you to state the trade-offs explicitly and makes future changes intentional. Don't let style choice drift into tribal preference ("we've always done REST") — the cost of the wrong style compounds over years of client integration.

Under the hood: a worked decision example

Abstract trade-off tables are useful — but the skill is applying them to a specific constraint set. Here is a single decision worked in detail across three concrete scenarios, with numbers to back each choice.

Scenario A — partner payment API (external, stable contract, high third-party diversity)

Constraints

500 partner companies integrate. They use PHP, Ruby, Java, Python, and shell scripts. Partners want to curl endpoints to test. Responses are read-only GET for the vast majority; webhooks push state changes out. Average payload: a payment record (~1.2 KB JSON).

DimensionRESTGraphQLgRPC
Partner tooling costZero — any HTTP client worksMedium — must learn SDL, install a GraphQL clientHigh — must install grpcurl or generated stubs; binary not curl-able
Payload size (1.2 KB JSON)1.2 KB + HTTP/1.1 headers (~400 B overhead)Same JSON payload; POST body adds query overhead (~200 B extra)~300–400 B protobuf + HTTP/2 headers (~80 B HPACK compressed) — 3× smaller
HTTP caching for GET /payments/{id}Yes — CDN caches by URL + headersNo — all POST; requires persisted queriesNo — application must implement
Documentation costLow — OpenAPI + familiar HTTP conceptsMedium — introspection helps, but SDL is a new concept for manyHigh — binary format requires generating docs from .proto

Decision: REST. The payload-size advantage of gRPC is real (~3× smaller) but at 1.2 KB it saves ~800 bytes per request — 0.8 ms at 10 Mbps. The tooling cost imposed on 500 external partners is measured in days of engineering effort, not milliseconds. HTTP caching eliminates read load on the origin. REST wins on total cost of ecosystem.

Scenario B — mobile BFF for a social feed (first-party, highly variable screen shapes)

Constraints

iOS and Android apps. List view needs 4 fields per post; detail view needs 12 fields; profile view needs 7 different fields from the same object. Apps are updated weekly; backend team is small. Average feed: 20 items.

DimensionREST (with ?fields=)GraphQLgRPC
Payload for list view (4 of 20 fields)With ?fields=: ~40% of full payload — ~480 BClient declares exactly 4 fields: ~480 B, no server deploy needed~180 B protobuf — smallest; but requires a proto change for each new field set
New screen field additionsNeeds server deploy if field doesn't existNo server deploy — client adds field to queryNeeds proto schema change + code regeneration + deploy on server and both app stores
N+1 problem3 sequential REST calls for heterogeneous screenOne query, resolved server-side3 sequential unary calls unless you build a composite RPC
Real-time updates (feed new posts)Polling or SSE (add-on)Subscriptions — first-classServer streaming — first-class, but browser needs grpc-web

Decision: GraphQL. The weekly iteration cycle means the front-end team constantly adds fields. With GraphQL they do it without waiting for a backend deploy. The payload savings over ?fields= are small; the latency saving of avoiding 3 sequential calls is real (3 × 60 ms RTT = 180 ms eliminated). GraphQL subscriptions are the right primitive for real-time feed updates on a browser/mobile client.

Scenario C — internal fraud scoring (internal, high-frequency, polyglot)

Constraints

Payment gateway (Go) calls fraud scorer (Python) on every transaction. Volume: 8,000 calls/second. Payload: a transaction record (~400 bytes). Latency budget: 20 ms. Both services owned by the same team. No browser client involved.

DimensionREST (JSON/HTTP1.1)gRPC (protobuf/HTTP2)
Payload size (400 B record)~400 B JSON~120 B protobuf — 3× smaller, 3× less CPU to parse at 8k req/s
Connection overhead at 8k req/sHTTP/1.1: ~8k TCP connections or keep-alive pool overheadHTTP/2 multiplexing: dozens of streams on one connection — near-zero connection overhead
CPU at 8k req/s (parse + serialise)JSON marshal/unmarshal ~2–5 µs per message in GoProtobuf marshal/unmarshal ~0.3–0.8 µs — 5–10× faster
Contract enforcementOptional — a JSON field rename is a silent runtime bugCompile-time — generated stubs fail to compile if interface breaks

Decision: gRPC. At 8k req/s and a 20 ms budget, the numbers matter. The per-message CPU savings of protobuf (~4 µs × 8k = 32 ms/s of CPU freed per core) are real. HTTP/2 multiplexing eliminates connection-pool churn. The Go ↔ Python polyglot boundary benefits from generated stubs catching contract breaks before they reach production.

Quick benchmarking note — how to validate your choice with a measurement

Trade-off reasoning is a starting point; a 30-minute measurement closes the loop. Here is the minimal toolkit:

# 1. Measure payload size — compare JSON vs protobuf for the same record $ curl -s https://api.example.com/v1/transactions/42 | wc -c 1247 # vs grpcurl output (pipe to protoc --decode to get the binary size) $ grpcurl -plaintext -d '{"tx_id":"42"}' svc:50051 tx.TxService/GetTx 2>&1 | wc -c 394 # Note: grpcurl output is JSON-decoded text; measure the raw wire bytes # with tcpdump -i lo0 -w capture.pcap and inspect DATA frame length in Wireshark # 2. Measure latency + throughput with wrk (HTTP) and ghz (gRPC) $ wrk -t4 -c100 -d30s https://api.example.com/v1/transactions/42 Requests/sec: 4312.88 Latency p99: 48.2ms $ ghz --insecure --proto tx.proto --call tx.TxService.GetTx \ -d '{"tx_id":"42"}' -c 100 -n 10000 svc:50051 Requests/sec: 18472.14 Latency p99: 11.8ms # 3. Measure CPU during the load test (Go pprof endpoint) $ curl http://localhost:6060/debug/pprof/profile?seconds=10 -o cpu.pprof $ go tool pprof -top cpu.pprof | grep -E "encoding|json|proto" # Compare JSON marshal time vs protobuf marshal time in the pprof flamegraph

The three things to measure are: payload bytes on the wire (tcpdump + Wireshark DATA frame size), end-to-end latency and throughput (wrk for HTTP, ghz for gRPC, under realistic concurrency), and CPU cost per request (pprof or equivalent). If the numbers align with the trade-off reasoning, commit. If they don't, the data wins.

How to debug & inspect it (validating your choice)

Before committing to a style for a new boundary, a 30-minute spike can surface surprises. Here are the exact commands to measure the three dimensions that usually drive the decision.

# --- Validate REST payload size + latency --- $ curl -o /dev/null -sw "%{size_download} bytes %{time_total}s\n" \ https://api.example.com/v1/users/42 1247 bytes 0.041s # Check what a sparse fieldset saves you $ curl -o /dev/null -sw "%{size_download} bytes\n" \ "https://api.example.com/v1/users/42?fields=id,name,avatar_url" 112 bytes # --- Validate HTTP caching is actually working --- $ curl -I https://api.example.com/v1/users/42 | grep -i "cache\|etag\|age" Cache-Control: public, max-age=60 ETag: "a3f8c" # A CDN HIT should show Age: >0 and X-Cache: HIT on the second call # --- Validate GraphQL field trimming reduces payload --- $ curl -X POST https://bff.example.com/graphql \ -H "Content-Type: application/json" \ -d '{"query":"{ user(id:\"42\") { id name avatarUrl } }"}' \ | wc -c 89 # vs full-record query: compare the two numbers # --- Validate gRPC throughput vs REST at concurrency --- $ ghz --insecure --proto user.proto --call user.UserService.GetUser \ -d '{"user_id":"42"}' -c 50 -n 5000 --duration 30s user-svc:50051 Summary: Count: 5000 Requests/sec: 9841.22 Fastest: 1.2ms Slowest: 18.4ms Mean: 5.1ms p99: 14.2ms
SymptomLikely causeFix
REST is slower than expected despite lower concurrencyHTTP/1.1 head-of-line blocking; too many TCP connections per clientEnable HTTP/2 on the server; use connection keep-alive; or switch to gRPC for this boundary
GraphQL responses are large despite requesting few fieldsResolver fetches the full object before trimming; N+1 — many sub-resolvers each add fieldsApply DataLoader; push field selection into the database query layer
gRPC payload is not smaller than JSONMost fields are strings (protobuf has no special string compression — it stores UTF-8 bytes just like JSON)Add compression (grpc-encoding: gzip); or accept the trade-off — protobuf's advantage is more about parse speed than string-heavy payload size
CDN hit rate is 0% for RESTResponses include Cache-Control: no-store or are POST requestsAudit the Cache-Control header; switch mutation endpoints to POST and read endpoints to GET
GraphQL adds visible latency vs equivalent RESTN+1 resolver problem — each related object triggers a database round-tripProfile with opentelemetry traces; add DataLoader batching for relational fields

Decision-validation checklist:

  1. Measure actual payload size on the wire (not the JSON representation) — use curl -sw "%{size_download}" or tcpdump.
  2. Run a load test at realistic concurrency — wrk for HTTP, ghz for gRPC — and compare p99 latency, not just mean.
  3. Check cache hit rates in CDN or proxy access logs — a 0% hit rate on cacheable REST endpoints is a misconfiguration.
  4. Profile server CPU under load to confirm serialisation is not the bottleneck (it usually isn't until very high call rates).
  5. If a style change is being evaluated, run both old and new side by side on production shadow traffic before committing.

🧠 Quick check

1. A startup launches a developer platform for third-party integrations. Which style minimises the barrier to entry for external developers?

REST wins for external developer experience: every language has HTTP client libraries, curl works, documentation is easy to write, and there's no special toolchain to install. GraphQL and gRPC both require additional tooling investment from the integrating party.

2. Which style requires the most explicit work to add caching support?

GraphQL is POST-based with no per-query URL, so standard HTTP caching doesn't apply at all. You must implement persisted queries, client-side normalised caches, or schema-level cache hints explicitly. gRPC also lacks built-in caching, but GraphQL's additional complexity around partial field overlap makes its caching problem especially nuanced.

3. An internal analytics pipeline needs to stream millions of events per second between Go and Java services. The team wants compile-time contract guarantees. Best fit?

This is gRPC's home territory: internal, polyglot, high-frequency, streaming, needs compile-time safety. REST over JSON would use more bandwidth and CPU; GraphQL subscriptions are designed for client-initiated real-time reads, not high-throughput internal pipelines.

4. A new iOS screen needs exactly 4 fields from 3 different REST endpoints. The back-end team is busy. Best architectural choice?

Option A avoids N API calls and avoids the back-end bottleneck. Option B wastes bandwidth and battery (3 trips, mostly wasted fields). Option C is the "backend for frontend" REST approach — viable but requires a back-end deployment for every new screen, which the prompt said is blocked.

✍️ Exercise A: design the API layer for a logistics platform (try before opening)

A logistics company has: (1) a partner portal for shipping companies to submit and track shipments; (2) a driver mobile app that needs live tracking updates and receives only the fields relevant to the current delivery; (3) a fleet-routing engine that exchanges thousands of route-optimisation messages per second with the dispatch service (both internal Go services).

Design the API layer. For each boundary, name the style and give two reasons for the choice.

Model answer:

Rubric: ✓ correct style for each boundary ✓ two distinct reasons each (not just "it's better") ✓ no style applied where it doesn't fit ✓ total solution is consistent (e.g., does not put gRPC at the public partner boundary).

✍️ Exercise B: critique a mixed-style architecture (try before opening)

A team reports: "We use GraphQL for our public API, REST for our mobile BFF, and REST for our internal microservices." Identify what's potentially backwards and suggest corrections.

Model answer:

Rubric: ✓ identifies the mismatch of GraphQL for external use ✓ recognises REST-for-BFF as a missed opportunity if field flexibility is needed ✓ suggests gRPC for internal if frequency warrants it ✓ does not simply say "everything is wrong" without nuance.

Key takeaways

Sources & further reading