API Design

Reliability & Scale · Lesson 04

API Gateway Deep Dive

Microservices multiply the surfaces clients must talk to. An API gateway collapses them into a single front door — and that front door does far more than just route traffic. Understanding what lives in the gateway (and what doesn't) is one of the clearest ways to distinguish junior from senior API thinking.

⏱ 20 min Difficulty: advanced Prereq: HTTP, rate limiting, auth basics

By the end you'll be able to

The single front door

Imagine a large hospital. Hundreds of departments — radiology, oncology, emergency, pharmacy — each with their own internal procedures, staff, and locations. The hospital's main reception is where all visitors arrive first. The receptionist checks your ID, confirms your appointment, routes you to the right department, translates your request ("I need an MRI" → "proceed to Building C, 3rd floor, room 312"), and keeps a log of everyone who came in. If reception closed, you'd have to know the internal layout, carry credentials for each department separately, and the hospital would have no central audit trail.

An API gateway is that reception desk for your services. Clients talk to one address. Everything behind it — identity verification, routing, quota enforcement, protocol translation, logging — lives in the gateway so it doesn't have to be duplicated in every service.

The formal definition: an API gateway is a single-entry-point reverse proxy that sits in front of a collection of backend services and implements shared cross-cutting concerns on behalf of those services.

Client mobile/web API Gateway 🔒 TLS termination 🪪 AuthN / AuthZ 🚦 Rate limiting 🗺 Routing / dispatch 🔄 Req/Resp transform 💾 Response caching 📦 Aggregation / BFF 🔌 Protocol translation 📋 Logging / tracing 💡 Observability Users service Orders service Inventory svc Payments svc Notif. service Internal network (plain HTTP or gRPC) HTTPS
Fig 1 — The API gateway as a single entry point. Every cross-cutting concern is enforced once in the gateway instead of being reimplemented in each service. Internal services communicate on the private network without TLS re-termination overhead.

Gateway responsibilities, one by one

1. Routing and dispatch

The gateway inspects the incoming request — the URL path, HTTP method, host header, or custom headers — and decides which backend service should handle it. A rule like "any request to /v1/orders/* goes to the orders service; /v1/users/* goes to the users service" is the fundamental routing table. More sophisticated gateways support traffic splitting (send 5% of /v1/search traffic to a new service version for canary deployments) and request mirroring (duplicate traffic to a shadow environment).

# Conceptual gateway routing table (Kong / NGINX style)

route users_service:
  match:
    path: "/v1/users"
    method: ["GET", "POST", "PATCH"]
  upstream: "http://users-service:8080"
  strip_path: false

route orders_service:
  match:
    path: "/v1/orders"
    method: ["GET", "POST"]
  upstream: "http://orders-service:8080"

route canary_search:
  match:
    path: "/v1/search"
  upstream:
    - target: "http://search-v1:8080"   weight: 95
    - target: "http://search-v2:8080"   weight: 5   # canary

2. Authentication and authorization offload

Instead of each service re-implementing "is this JWT valid? what permissions does this token grant?", the gateway verifies credentials once on every inbound request and either rejects the request early (401 Unauthorized, 403 Forbidden) or stamps the request with a verified identity header for downstream services to trust.

A common pattern: the gateway validates the Bearer JWT against a public key or introspects the token with an auth service, extracts the user ID and scopes, then adds trusted internal headers like X-User-Id: 42 and X-User-Scopes: read:orders write:orders. Downstream services read these headers without re-validating the token — they trust the gateway.

# Gateway auth middleware — pseudo-code
function auth_middleware(request):
  token = extract_bearer(request.headers['Authorization'])
  if not token:
    return Response(401, { "error": "missing token" })

  claims = verify_jwt(token, public_key=JWKS_URI)
  if not claims:
    return Response(401, { "error": "invalid token" })

  required_scope = ROUTE_SCOPE_MAP[request.path]
  if required_scope not in claims.scopes:
    return Response(403, { "error": "insufficient scope" })

  # Stamp trusted identity headers for downstream
  request.headers['X-User-Id']     = claims.sub
  request.headers['X-User-Scopes']  = claims.scopes
  request.headers['Authorization']   = ""  # strip raw token
  return FORWARD(request)

3. Rate limiting

Covered in depth in the previous lesson, but the gateway is the canonical enforcement point: it sits before every backend service, can enforce per-key quotas using a shared Redis store, and can return HTTP 429 with Retry-After headers without any backend service being involved.

4. TLS termination

Clients connect to the gateway over HTTPS (TLS). The gateway terminates the TLS session — decrypts the traffic — then forwards the request to backend services over plain HTTP on the private internal network. This means backend services don't need to manage TLS certificates, and the gateway can apply SSL policies (minimum TLS version, cipher suite enforcement) in one place. Re-encrypting on the internal leg ("TLS passthrough" or "end-to-end TLS") is an option for high-security environments where even internal traffic must be encrypted.

5. Request and response transformation

The gateway can modify requests and responses in flight: add or remove headers, rewrite paths, reshape JSON bodies, convert query parameters to headers. This is used for API versioning (the gateway rewrites /v2/users to /v1/users?v=2 before the backend sees it), for hiding internal implementation details from the public API surface, and for injecting standard fields (request IDs, correlation headers).

# Request transformation examples

# 1. Inject a correlation ID on every inbound request
request.headers['X-Request-Id'] = generate_uuid()

# 2. Path rewrite: hide internal versioning from public API
#    Public:   GET /v1/products/123
#    Internal: GET /api/catalog/product?id=123
if request.path.startswith('/v1/products/'):
  product_id = request.path.split('/')[-1]
  request.path = "/api/catalog/product"
  request.query['id'] = product_id

# 3. Response transformation: add deprecation warning
if request.path.startswith('/v1/'):
  response.headers['Deprecation'] = "true"
  response.headers['Sunset']      = "2026-12-31"

6. Aggregation and composition

A single client request may need data from multiple backend services. Instead of forcing the client to make three separate calls (and incur three round-trips over a mobile connection), the gateway can fan-out the three requests in parallel, merge the responses, and return one combined payload. This is sometimes called the "API composition" or "aggregator" pattern and is closely related to the Backend-for-Frontend pattern described below.

7. Response caching

The gateway can cache upstream responses for cacheable resources (GET requests with appropriate Cache-Control headers). Subsequent identical requests are served from the gateway's cache without hitting the backend, reducing latency and backend load. This is safe only for idempotent, read-only operations and requires careful cache key design (include auth headers if responses are user-specific).

8. Protocol translation

Clients might speak REST over HTTP/1.1 while backend services use gRPC (HTTP/2 + Protocol Buffers) or WebSockets. The gateway translates between protocols. For example, AWS API Gateway can expose a REST endpoint that internally invokes a Lambda function — the gateway handles the HTTP→Lambda invocation translation completely transparently to the client.

9. Observability: logging, tracing, and metrics

Because every request passes through the gateway, it's the ideal place to emit a single structured log line per request, start a distributed trace span, and increment request-count, latency, and error-rate metrics — for every service simultaneously. Services don't need their own logging middleware for the standard fields (path, method, status, latency, client ID).

The Backend-for-Frontend (BFF) pattern

A general-purpose API gateway serves all clients — web, mobile, partner integrations. But different clients have very different needs: a mobile app wants compact responses with minimal fields to save bandwidth; a desktop web app wants richer payloads with nested objects; a third-party partner wants a different versioning and auth model. Serving all of them from one generic API means every response is a compromise.

The Backend-for-Frontend (BFF) pattern solves this by creating one gateway per major client type. Each BFF is a thin aggregation and translation layer tailored to exactly one consumer. The "backend services" (users, orders, inventory) remain generic and unchanged; the BFF composes, filters, and reshapes their responses for its specific client.

Mobile app Web app Partner API Mobile BFF compact payloads Web BFF rich nested data Partner BFF versioned REST/OAuth Users service Orders service Inventory svc All BFFs speak to the same backend services; each BFF is owned by the team that owns its client.
Fig 2 — Backend-for-Frontend (BFF) pattern. Each client type gets its own gateway tailored to its data shape, auth model, and versioning needs. Backend services remain generic and are not polluted by client-specific concerns.

The BFF pattern is typically owned by the frontend team — the same team that owns the mobile app owns the mobile BFF, which means they can change the API contract without coordinating with every other team. The pattern introduces a maintenance burden (N gateway codebases) but unlocks independent evolution per client.

Gateway vs. load balancer vs. reverse proxy vs. service mesh

These four components often appear together in architecture diagrams and are frequently confused in interviews. They overlap in capability, but each has a distinct primary job and reason for existence.

Client Reverse proxy TLS, DDoS, cache NGINX / Cloudflare API Gateway auth, rate limit, routing, transform Kong / AWS GW Load balancer L4/L7 dispatch health checks Service A Service B Service C Service mesh (Envoy sidecars): east-west traffic between services
Fig 3 — The four layers. The reverse proxy and API gateway handle north-south traffic (client to services). The load balancer dispatches to specific instances. The service mesh handles east-west traffic (service to service). In practice these layers can collapse or overlap.

Comparison table

Component Primary job Operates at Knows about Real examples Does NOT typically do
Reverse proxy TLS termination, DDoS mitigation, static caching L7 HTTP (north-south) HTTP requests, hostnames, URLs NGINX, Caddy, Cloudflare Business auth logic, API quotas per user
API Gateway Single entry point: auth, rate limiting, routing, transform L7 HTTP/gRPC (north-south) API keys, JWT claims, routes, quotas, versions AWS API Gateway, Kong, Apigee, Traefik Health-based instance selection, TCP load distribution
Load balancer Distribute connections across healthy instances L4 TCP or L7 HTTP Server health, connection counts, response times AWS ALB/NLB, HAProxy, GCP Cloud LB Auth, quotas, response transformation
Service mesh Secure, observable east-west traffic between microservices L4/L7 (east-west) Service identity (mTLS), circuit breakers, retries, traces Envoy, Istio, Linkerd, Consul Connect Client-facing auth, API versioning, external routing
⚠️ "Can't a load balancer do what the gateway does?"

A Layer 7 load balancer (like AWS ALB) can inspect HTTP and do path-based routing — which looks like a gateway. But a load balancer's job is instance selection: among the healthy instances of a service, which one gets this connection? It has no concept of API keys, user quotas, JWT validation, or response transformation. An API gateway's job is cross-cutting policy: is this caller allowed? how many calls have they made? what format does the response need to be in? They often appear stacked: ALB in front for L4 availability, gateway behind it for L7 policy.

✅ A simple mental model for interviews

Reverse proxy = "protect my server from the raw internet." Load balancer = "spread connections across healthy instances." API gateway = "enforce policy for all my APIs in one place." Service mesh = "secure and observe how my services talk to each other." These four jobs rarely compete; they stack.

Real gateway products

ProductTypeKey characteristicBest for
AWS API GatewayManaged cloudNative integration with Lambda, IAM, Cognito; pay-per-call pricingServerless APIs on AWS
KongOpen source + enterprisePlugin architecture (auth, rate limit, transform as first-class plugins); runs on KubernetesOn-premise or multi-cloud; teams that need custom plugins
NGINXReverse proxy / gatewayHigh performance; lua/NJS scripting for custom logic; battle-tested at high loadHigh-throughput deployments; replacing Apache
EnvoyProxy / service mesh data planeDynamic xDS config, first-class observability, HTTP/2 and gRPC nativeService mesh data plane (Istio); edge proxy for gRPC-heavy stacks
TraefikCloud-native reverse proxyAuto-discovers routes from Kubernetes Ingress/CRDs; Let's Encrypt ACME built inKubernetes-native teams who want zero-config routing

Gateway failure modes

Single point of failure

Because the gateway is the single front door, if it goes down, every service it fronts becomes unreachable — even services that are perfectly healthy. This is the fundamental availability tax of the pattern. The mitigation: run the gateway in a highly available (HA) cluster. Most managed gateways (AWS API Gateway, Cloudflare) handle this for you. Self-hosted gateways (Kong, NGINX) require you to run multiple instances behind a load balancer with health checks, automatic instance replacement, and a rolling upgrade strategy.

# Minimal HA gateway topology (Kubernetes example)
# Kong running as a Deployment with ≥2 replicas
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kong-gateway
spec:
  replicas: 3               # min 2 for HA; 3 for maintenance safety
  strategy:
    rollingUpdate:
      maxUnavailable: 0    # zero-downtime rolling upgrade
      maxSurge: 1
  template:
    spec:
      affinity:
        podAntiAffinity:    # spread across nodes — don't co-locate replicas
          requiredDuringSchedulingIgnoredDuringExecution:
            - topologyKey: kubernetes.io/hostname

Added latency

Every request through the gateway adds one extra network hop: client → gateway → service instead of client → service directly. For an internal microservice-to-microservice call inside a data center, this is typically 0.2–1 ms — negligible. For edge deployments where the gateway is geographically distributed, latency can be near zero if the gateway is co-located. The concern is when the gateway itself is slow: a sluggish auth plugin, an overloaded rate-limit Redis, or an expensive transformation can add 10–50 ms to every request. Monitor gateway-added latency (total latency minus upstream latency) as a dedicated metric and alert on it.

Configuration drift

A gateway's power comes from its routing table and policy configuration. As services evolve, old routes that point to deprecated or renamed services can linger in the gateway config, causing mysterious 404s or routing to the wrong service. Treat gateway configuration as code: version it in git, review changes, and run integration tests against the routing table on every deployment.

🎯 Interview angle — "What does an API gateway do?" and "Gateway vs. load balancer"

Two of the most common questions in system design and infrastructure interviews. For "what does a gateway do?" — don't just say "routing." Walk through all 9 responsibilities and explain the unifying theme: the gateway centralizes cross-cutting concerns so services don't have to each implement them. For "gateway vs. load balancer" — the mental model is everything: the load balancer selects which instance handles a request; the gateway decides whether and how the request is permitted and transformed before it ever reaches an instance. They solve different problems and are typically both present, stacked.

Under the hood: one request through every gateway stage

Describing what a gateway does is easy. Understanding in what order, with what data structure, and which errors originate where is what lets you debug a production incident. Here is a single HTTPS GET /v1/orders/99 traced through every stage of a Kong-style gateway. The numbers on the left are approximate wall-clock microseconds from connection accept.

① TLS Terminate decrypt, validate cert ② AuthN verify JWT / API key ③ Rate Limit quota check (Redis) ④ Route Match select upstream host ⑤ Req Transform add X-Request-Id header ⑥ Upstream Call HTTP to orders-svc:8080 ⑦ Resp Transform add Deprecation header ⑧ Return TLS re-encrypt → client → 401 if token invalid → 429 if quota exceeded → 404 if no route Annotated trace log (µs elapsed since connection accept) +0 µs TCP accept; TLS ClientHello received +180 µs TLS handshake complete; plaintext HTTP/1.1 visible +185 µs Authorization: Bearer eyJ… extracted +210 µs JWT sig verified (JWKS cached); claims: sub=user_7, scope=read:orders +215 µs Rate-limit key: user_7 / GET /v1/orders/* — Redis INCR → 43 / 1000 OK +220 µs Route matched: orders-svc:8080 (path prefix /v1/orders) +222 µs Request transform: inject X-Request-Id: a3f1-7c2e; strip raw Authorization +225 µs Upstream TCP connect to orders-svc:8080 +1.4 ms orders-svc responds: 200 OK {"id":99,"status":"shipped"…} +1.41 ms Response transform: add Deprecation: true, Sunset: 2026-12-31 (v1 path) +1.42 ms TLS encrypt response; write to client socket +1.45 ms Access log: GET /v1/orders/99 200 1.45ms user_7 a3f1-7c2e orders-svc
Every gateway stage in order, with the actual decision made and the error it can produce. The upstream call (step ⑥) is the only stage that touches a backend service — all others are gateway-local. The total gateway overhead in this trace is about 50 µs; the upstream takes 1.2 ms.

Two details that matter for debugging: (a) the gateway strips the raw Authorization token before forwarding — the upstream service never sees the original credential; it trusts only the gateway-stamped X-User-Id and X-User-Scopes headers. (b) the X-Request-Id injected at step ⑤ is the correlation handle that lets you join the gateway access log to the upstream service log to a distributed trace span — you must log it consistently in every service.

# Minimal Kong-style route + plugin config (declarative YAML)

services:
  - name: orders-svc
    url: http://orders-service:8080

routes:
  - name: orders-route
    service: orders-svc
    paths: ["/v1/orders"]
    methods: [GET, POST, PATCH]

plugins:  # applied in this order on each request
  - name: jwt              # ② AuthN — reject 401 if token invalid
    config:
      key_claim_name: sub
      claims_to_verify: [exp]

  - name: rate-limiting    # ③ Rate limit — reject 429 if exceeded
    config:
      second: 50
      minute: 1000
      policy: redis

  - name: request-transformer  # ⑤ Req transform
    config:
      add:
        headers: ["X-Request-Id:$(uuid)"]
      remove:
        headers: [Authorization]

How to debug & inspect it

Gateway errors split cleanly into two buckets: gateway-generated (the gateway itself produced the error before touching any upstream) and upstream-proxied (the upstream returned a non-2xx and the gateway forwarded it, possibly translated). Mixing them up wastes hours. The fastest separator is the response body and the presence of upstream-specific headers.

$ curl -i -H "Authorization: Bearer $TOKEN" https://api.example.com/v1/orders/99 HTTP/1.1 502 Bad Gateway X-Request-Id: a3f1-7c2e Via: kong/3.6.1 Content-Type: application/json {"message":"An invalid response was received from the upstream server"} # ^^^ This error body is in Kong's own format — the upstream service was reached # but returned something Kong couldn't parse (e.g. a TCP reset or empty body). # Compare with a gateway-auth 401, which has a different message format.

Use the X-Request-Id (or whatever correlation header your gateway injects) to pivot from the gateway access log to the upstream log:

# 1. Find the gateway access log entry for the request $ grep 'a3f1-7c2e' /var/log/kong/access.log 2025-06-20T10:14:55Z GET /v1/orders/99 502 42ms upstream=orders-svc latency_upstream=38ms request_id=a3f1-7c2e # gateway total: 42ms, upstream responded in 38ms (so gateway overhead = 4ms, upstream error) # 2. Search upstream service logs with the same id $ kubectl logs deployment/orders-svc | grep 'a3f1-7c2e' 2025-06-20T10:14:55Z request_id=a3f1-7c2e panic: runtime error: nil pointer dereference # ^^^ The upstream panicked — that's why the gateway saw an empty/broken response

Distinguishing a gateway-generated 502/504 from an upstream error:

SymptomCauseFix
502 with gateway-branded body, no upstream log entryGateway could not connect to the upstream at all (DNS failure, upstream down, port wrong)Check the upstream host/port in the gateway route config; verify the upstream service is running; check network policy
502 with gateway-branded body, upstream log shows a crash or 5xxUpstream crashed or returned malformed HTTP (e.g. missing status line)Fix the upstream application bug; the gateway is just reporting faithfully
504 with gateway-branded bodyUpstream did not respond within the gateway's read timeoutProfile the upstream endpoint; increase the gateway timeout if justified; add caching upstream of the slow query
401 with gateway-branded body, no upstream logToken failed JWT validation in the gateway — upstream was never calledDecode the token (jwt decode $TOKEN), check exp, check the algorithm, check the JWKS endpoint the gateway uses
429 with gateway-branded body, no upstream logRate-limit quota exhausted in gateway (Redis counter hit ceiling)Inspect X-RateLimit-Remaining and Retry-After headers; check the rate-limit plugin config; tune quotas or add burst allowance
404 with gateway-branded body (not upstream 404)No route matched in the gateway routing tableRun kong routes list (or equivalent); verify the path prefix and method match exactly; check for trailing-slash mismatches

Debug checklist for gateway incidents:

  1. Capture the X-Request-Id (or traceparent) from the failing response — this is your pivot key.
  2. Check the gateway access log for that ID: note upstream latency vs. total latency. If upstream latency is absent or zero, the error was gateway-side (auth, rate limit, no route).
  3. If the upstream was called, search the upstream service log for the same request ID and read the upstream error.
  4. For 502/504: test direct connectivity from the gateway pod to the upstream with curl or nc — rules out DNS/network issues independent of the gateway config.
  5. For auth 401: decode the JWT header and payload (echo $token | cut -d. -f1,2 | base64 -d); check exp, alg, and issuer match what the gateway is configured to accept.
  6. For 429: confirm which rate-limit counter is exhausted — per-user vs. per-IP vs. global — and whether the Retry-After header is being honored.

In production: how leading APIs do it

The gateway landscape splits into two camps: managed cloud gateways that handle availability, scaling, and certificate management for you, and self-hosted gateways that give you full control at the cost of operational burden. Every major architecture at scale has converged on centralising the same cross-cutting concerns — the differences are in deployment model and extension mechanism.

SystemTypeWhat it handles
AWS API Gateway Managed cloud Routing; token-bucket throttling with configurable burst and rate; usage plans + API keys for per-consumer quotas; authorizers (Lambda custom, Cognito user pools, or native JWT); request/response mapping templates (Velocity); stages + canary deployments (traffic split by percentage); response caching with configurable TTL; WAF integration for IP-based rules and managed rule groups.
Kong Self-hosted (NGINX/OpenResty core) Plugin model where every cross-cutting concern — authentication, rate limiting, request/response transformation, logging, CORS — is a first-class plugin applied per route or globally. Declarative configuration via YAML (deck); Kubernetes-native via the Kong Ingress Controller. Enterprise edition adds OIDC, RBAC, and a developer portal.
Envoy L7 proxy / service-mesh data plane Dynamic xDS API for configuration (no restart required); routing with retries, circuit breaking, and outlier detection; first-class HTTP/2 and gRPC support; rich observability via stats and tracing sinks. Used as the data-plane sidecar in Istio and as a standalone edge proxy. Does not come with a management UI — typically configured by a control plane.
Netflix Zuul / Spring Cloud Gateway JVM edge routing + filter chain Netflix open-sourced Zuul as a filter-chain edge router that handled auth, dynamic routing, and resilience for hundreds of services. Spring Cloud Gateway is the Spring-ecosystem successor, using a predicate/filter model. Both illustrate the pattern of edge routing + filters at JVM scale, and Netflix's tech blog documents the architectural decisions in detail.
Cloudflare / Apigee Managed edge Cloudflare Workers and API Shield operate at the CDN edge: DDoS mitigation, rate limiting, bot management, and JWT validation happen before traffic reaches your origin. Apigee (Google Cloud) adds a full developer portal, analytics, and monetization layer targeted at enterprise API programs.

The common thread. Every system in this table — regardless of vendor, deployment model, or underlying technology — implements the same architectural insight: a gateway centralises cross-cutting concerns so individual services do not have to reimplement them. Authentication, rate limiting, TLS termination, and request routing appear in every gateway because these concerns affect every API call and have no business living in individual services. When a payment service also validates JWTs and enforces quotas, you have N copies of that logic to keep in sync, N places for a security misconfiguration, and N deployment targets every time a policy changes. Moving those concerns to the gateway reduces them to one. The managed vs. self-hosted distinction changes who operates the gateway — it does not change the architectural pattern.

AWS API Gateway's usage-plan documentation, Kong's plugin hub, Envoy's architecture overview, and the Netflix tech blog on Zuul all describe the same decomposition from different angles. Each is worth reading once — the vocabulary differences are superficial; the structural decisions are identical.

How leading APIs do it

🧠 Quick check

1. Your company runs 12 microservices. Each service currently validates JWTs independently using the same shared library. A new security requirement mandates key rotation every 24 hours. Which approach best solves this?

Auth offload to the gateway means the key rotation logic lives in exactly one place. Services don't need to be redeployed; only the gateway configuration changes. This is precisely the "centralize cross-cutting concerns" benefit of the gateway pattern.

2. An AWS Application Load Balancer (ALB) can do path-based routing — so it can route /v1/users to the users service and /v1/orders to the orders service. Does that make it an API gateway?

The ALB's primary job is distributing connections across healthy instances. Path-based routing is a convenience feature for instance selection, not a policy enforcement mechanism. An API gateway owns auth, quotas, transformation, and rate limiting — none of which ALB provides natively.

3. You run a self-hosted Kong gateway as a single instance. What is the first reliability concern to address?

A single gateway instance means the entire API surface depends on one process. A crash, OOM, or bad deployment takes down every service simultaneously — even perfectly healthy ones. Run at minimum 2 instances behind a load balancer with health checks. The latency added by a gateway is typically sub-millisecond on the internal network.

4. A mobile team complains that the public REST API returns too much data (they only use 3 of 40 fields) and forces them to make 4 separate calls to render one screen. Which pattern directly addresses this?

The BFF pattern creates a gateway layer tailored to the mobile client. The mobile BFF makes 4 parallel calls to the underlying services, merges them, strips the 37 unused fields, and returns a single compact response. The mobile team owns the BFF and can evolve it independently without touching any backend service.

🏗️ Exercise 1 — Design a gateway architecture for a multi-platform product
Scenario You are the lead engineer for a SaaS product that serves three client types: (1) a React web dashboard, (2) an iOS/Android mobile app, (3) a REST API consumed by enterprise partners. The backend has 8 microservices. The current architecture has each client talking directly to each service — 24 possible pairs of client×service, each with its own auth logic. You are asked to design a gateway layer.

Questions to answer:

  1. Should you use a single shared gateway or the BFF pattern? Justify your choice.
  2. List 5 responsibilities the gateway should own that are currently duplicated across the 8 services.
  3. What is the single biggest risk introduced by adding a gateway layer, and how do you mitigate it?
  4. The mobile app needs responses with ≤5 fields; the web dashboard needs the same endpoint to return 30+ fields. How does your gateway design handle this?

Model answer:

  1. BFF pattern. Three distinct client types with fundamentally different auth models (partner REST uses API keys + OAuth; mobile uses device tokens; web uses session cookies), different payload requirements, and owned by different teams — BFF is the right call. A single shared gateway would become a compromise layer that serves no client well and creates cross-team coordination overhead.
  2. JWT/API key validation; rate limiting; request/response logging; TLS termination; correlation ID injection. These five are currently copied across all 8 services.
  3. Single point of failure. Mitigation: run each BFF as a multi-instance deployment (≥2 replicas) behind a load balancer, with health checks and automated restart. Use a managed gateway where possible to offload availability guarantees.
  4. The mobile BFF fetches the full response from the relevant service and strips it to the 5 required fields before returning. The web BFF fetches the same endpoint and returns the full payload. Two BFFs, same upstream service, different response shapes — neither service changes.

Rubric: ✓ BFF vs. single gateway decision with justification ✓ At least 4 cross-cutting concerns named ✓ SPOF identified + HA mitigation ✓ Response shaping per client explained at the BFF layer. Hitting all 4 = strong answer.

🔍 Exercise 2 — Gateway vs. load balancer vs. service mesh distinction
Scenario A colleague shows you this architecture diagram description and asks you to critique it: "We run Envoy as our API gateway in front of the internet. Behind Envoy is an NGINX reverse proxy that does TLS termination. Behind NGINX is an AWS ALB that routes paths to services. Inside the cluster, Istio handles service-to-service mTLS."

Questions:

  1. Identify at least two redundancies or misassignments of responsibility in this stack.
  2. Redraw (in words) a leaner version that eliminates the redundancy.
  3. What does Istio provide that the other layers cannot?

Model answer:

  1. Redundancy 1: Both Envoy and NGINX can do TLS termination; having both in sequence means TLS is terminated and re-established unnecessarily, adding latency and complexity. Redundancy 2: Both Envoy and ALB can do L7 path-based routing; running both in sequence doubles the routing config surface and adds another hop. NGINX as a dedicated reverse proxy between Envoy and ALB adds a third hop with no added value.
  2. Leaner stack: Cloudflare/CDN (DDoS, anycast, DDoS) → Envoy as API gateway (TLS termination, auth, rate limiting, path routing) → backend services directly (Envoy routes to service instances; Envoy supports health-based upstream selection). Inside the cluster, Istio sidecars handle east-west mTLS. This removes NGINX and ALB entirely, leaving 2 instead of 4 network hops for north-south traffic.
  3. Istio (running Envoy sidecars as a service mesh) handles east-west mTLS — encrypted, authenticated communication between services inside the cluster. None of the other layers (CDN, API gateway, load balancer) operate in the east-west path. Istio also provides circuit breaking, retry policies, distributed tracing, and traffic shifting for service-to-service calls without modifying service code.

Rubric: ✓ Both redundancies identified ✓ Leaner architecture removes at least one hop ✓ Envoy/Istio east-west vs. north-south distinction correct. Hitting all 3 = strong answer.

Key takeaways

Sources & further reading