API Design

Interview Prep · Lesson 01

API Interview Question Bank

Eighty-plus original questions — from first-principles fundamentals to live system-design prompts — each with a tight, correct model answer you can actually say out loud.

⏱ Self-paced Difficulty: LOW → HIGH Prereq: Course lessons 01–10
✅ How to use this bank

Work section by section. For each question: say your answer out loud (or write it in one sentence) before opening the panel. Compare against the model answer — not to memorize it word-for-word, but to check whether you hit the key idea. If you couldn't answer confidently, mark the lesson that covers it and revisit. The difficulty chips (LOW / MEDIUM / HIGH) match the seniority bar at which interviewers expect a solid answer unprompted.

1. Fundamentals

LOW What is an API, in one sentence?
An API is a published contract that lets one piece of software ask another for a service, defined by the shapes of the request and the response — not by the implementation behind it. The key word is contract: the caller depends on the interface staying stable, not on the internals staying the same.
LOW What does "idempotent" mean and which HTTP methods must be idempotent?
An operation is idempotent when performing it N times (N ≥ 1) produces the same server state as performing it once. GET, HEAD, OPTIONS, PUT, and DELETE are required by RFC 9110 to be idempotent. POST is not — each call may create a new resource. Idempotency matters most when networks are unreliable: you can safely retry an idempotent request after a timeout without fear of double-effects.
LOW What are the main HTTP methods and what does each mean?
GET retrieves a resource (safe, idempotent, no body); POST creates or triggers an action (neither safe nor idempotent); PUT replaces a resource in full (idempotent); PATCH applies a partial update (neither safe nor required to be idempotent, though implementations often make it so); DELETE removes a resource (idempotent). HEAD is GET without a response body — useful for metadata checks. OPTIONS asks which methods the server supports for a resource, used by CORS preflight.
LOW Walk me through the main HTTP status-code families.
1xx are informational (rarely seen in practice; 101 Switching Protocols handles WebSocket upgrades). 2xx signals success: 200 OK is the standard reply, 201 Created follows a successful POST that made something, 204 No Content is success with no body. 3xx redirects: 301 is permanent, 302 or 307 temporary, 304 Not Modified lets the client use its cache. 4xx are client errors: 400 bad input, 401 unauthenticated, 403 forbidden (authenticated but not authorized), 404 not found, 409 conflict, 429 too many requests. 5xx are server faults: 500 generic, 502 bad gateway, 503 service unavailable, 504 gateway timeout.
LOW What are the six constraints that define REST?
Roy Fielding's dissertation names: (1) client–server separation, (2) statelessness — each request carries all needed context, no session stored on the server, (3) cacheability — responses must declare whether they are cacheable, (4) uniform interface — fixed set of operations (methods + URIs) rather than arbitrary verbs, (5) layered system — client cannot tell if it is talking to the origin or a gateway, (6) code on demand (optional) — servers may send executable code. Statelessness and uniform interface are the constraints most interviewers probe.
LOW What is statelessness and what is the trade-off?
Statelessness means the server retains no per-client session state between requests; the client must include every credential, token, or context token in every call. The upside is horizontal scalability — any server in the pool can handle any request. The downside is larger per-request payloads and the burden of re-authenticating every call; for actions that span multiple steps (wizards, multi-leg transactions) the client must stitch state together itself, typically with a session ID or a resource like a "cart".
LOW What is the difference between synchronous and asynchronous APIs?
A synchronous API completes the operation before returning a response: the client blocks until it has the answer. An asynchronous API accepts the request and returns a job reference (e.g., 202 Accepted with a Location header) while the work continues in the background; the client polls or is pushed a completion event later. Async is essential for operations that take more than a few hundred milliseconds — video encoding, report generation, cross-region data replication — where keeping a socket open would be brittle and wasteful.
LOW What is a resource in REST?
A resource is any named concept that can be addressed individually: a user, an order, a report, even a derived aggregate like a "balance". Each resource has a stable URI that is its address; the representation (JSON, XML, binary) returned may change via content negotiation. The discipline of modelling nouns (resources) rather than verbs (actions) is what separates REST from old-style RPC over HTTP.
LOW What is content negotiation?
Content negotiation is the mechanism by which a client and server agree on the format of a response. The client sends an Accept header listing acceptable MIME types (e.g., Accept: application/json, application/xml;q=0.9) with optional quality weights. The server picks the best match it can produce and declares the chosen type in Content-Type. If no match exists it returns 406 Not Acceptable. This lets one endpoint serve JSON to apps and XML to legacy partners without duplication.
LOW Why is it bad practice to use GET to perform mutations?
GET is defined as safe — it must not change server state. Browsers, CDNs, search-engine crawlers, and caching proxies will freely issue GET requests without expecting side effects. If a GET deletes a row, prefetching or link-following by an intermediary would silently corrupt data. It also breaks RESTful semantics and prevents any layer from caching the "read" result. Mutations belong in POST, PUT, PATCH, or DELETE.
LOW What is the difference between a URI and a URL?
A URI (Uniform Resource Identifier) is the broader concept: a string that names or locates a resource. A URL (Uniform Resource Locator) is a URI that includes enough information to locate the resource — typically a scheme, host, and path. All URLs are URIs, but a URI that only names (e.g., urn:isbn:0451450523) is not a URL. In practice, "URL" and "URI" are used interchangeably in API conversations, but the spec distinction matters for caching keys and canonical identity.
LOW What does HTTP/1.1 keep-alive do?
Keep-alive (persistent connections, the default in HTTP/1.1) reuses a single TCP connection for multiple request–response pairs instead of opening a new TCP handshake per request. This reduces latency — especially the TCP three-way handshake and TLS negotiation costs — but limits a connection to one in-flight request at a time (head-of-line blocking). HTTP/2 and HTTP/3 solved that with multiplexing.
LOW What is head-of-line blocking?
Head-of-line (HOL) blocking occurs when one slow or lost unit blocks all subsequent units behind it. In HTTP/1.1, a slow response on a connection holds up every request queued behind it. HTTP/2 introduced stream multiplexing within a single TCP connection to eliminate HTTP-level HOL, but TCP's own ordering guarantee re-introduces HOL at the transport layer — which HTTP/3 (over QUIC/UDP) fully eliminates with independent per-stream loss recovery.
LOW What is HATEOAS and do real-world REST APIs use it?
HATEOAS (Hypermedia as the Engine of Application State) is a REST constraint requiring servers to include navigational links in every response so clients discover available actions dynamically rather than out-of-band. In theory it decouples clients from URL structure. In practice almost no public production API fully implements it because it adds payload overhead, complicates clients, and provides limited benefit when a human-readable OpenAPI spec exists. Many APIs include a subset of links for convenience (next-page URLs in pagination) without claiming full HATEOAS compliance.

2. HTTP, REST & API Styles

LOW What is the difference between safe and idempotent in HTTP?
Safe means the operation has no observable side effects — the server state is unaffected. GET and HEAD are safe. Idempotent means the effect of calling N times equals calling once. GET, HEAD, PUT, DELETE, and OPTIONS are idempotent. A method can be idempotent without being safe: DELETE changes state on the first call but further calls leave state identical (the resource is still gone). POST is neither safe nor idempotent by definition.
LOW When would you choose GraphQL over REST?
GraphQL is a good fit when clients have highly variable data needs — mobile clients needing subsets of large objects, dashboards that aggregate fields from multiple resources, or rapidly iterating frontends that want to avoid server-side changes per new screen. The single endpoint with typed schema reduces over-fetching and under-fetching. The downsides: complex query execution can cause N+1 database problems, file uploads are awkward, and HTTP-level caching on query responses is hard. REST still wins for simple CRUD, public APIs with stable contracts, and anywhere HTTP caching matters.
LOW When would you choose gRPC over REST?
gRPC over HTTP/2 is ideal for internal service-to-service communication where latency and throughput matter: it uses Protobuf's binary encoding (much smaller than JSON), supports bidirectional streaming, and generates strongly-typed stubs for both ends. Choose it for microservice meshes, mobile clients with bandwidth constraints, or streaming event pipelines. REST with JSON is better for public APIs, browser clients (gRPC-web adds complexity), and teams that value human-readable payloads over performance.
LOW What are the main API versioning strategies and their trade-offs?
Four common approaches: (1) URI versioning (/v2/users) — easy to route, obvious in logs, but URL hygiene purists dislike it and old versions litter the URL space. (2) Header versioning (API-Version: 2) — clean URLs, harder to test in a browser. (3) Query-param versioning (?version=2) — optional and bookmarkable. (4) Content-type versioning (Accept: application/vnd.myapp.v2+json) — most RESTfully correct but awkward for most teams. URI versioning is by far the most common in practice because it is the most operationally transparent.
MEDIUM What is cursor-based pagination and when is it better than offset pagination?
Offset pagination (?page=5&per_page=20) tells the database to skip N rows. This is simple but degrades at high offsets because the DB must scan all skipped rows; it also skips or duplicates items when rows are inserted/deleted between pages. Cursor pagination uses an opaque pointer — typically the last item's ID or a composite sort key — that the server can use in a WHERE clause directly (e.g., WHERE id > :cursor). This is O(1) in offset and stable under concurrent writes. Use cursors for large, frequently updated datasets like feeds; offset is fine for small, static tables and admin UIs that need arbitrary page jumps.
MEDIUM What is HTTP/2 multiplexing and what problem did it solve?
HTTP/2 runs multiple request–response streams concurrently over a single TCP connection, each identified by a stream ID. In HTTP/1.1 you needed multiple parallel connections (browsers open 6–8) to overlap requests, wasting socket overhead. Multiplexing lets a browser send all sub-resource requests at once, server-push eliminates RTT for predicted assets, and header compression (HPACK) removes repeated header overhead. The remaining limitation is TCP-level head-of-line blocking when a packet is lost — HTTP/3 solves that by moving to QUIC over UDP.
MEDIUM What is the difference between HTTP/2 and HTTP/3?
HTTP/2 runs over TLS-over-TCP. TCP's reliability guarantee means a lost packet halts all streams on that connection until it is retransmitted — transport-layer HOL blocking. HTTP/3 runs over QUIC, a transport protocol built on UDP that implements its own reliability, congestion control, and TLS 1.3 in one handshake. QUIC maintains independent stream loss recovery, so a lost packet only stalls the stream that needed it. It also supports 0-RTT reconnection for returning clients and built-in connection migration (useful when a mobile device switches from Wi-Fi to LTE).
MEDIUM What is REST vs RPC philosophically?
REST organizes the API around nouns (resources) and uses a small fixed set of standard methods to operate on them; the focus is on state transfer between representations. RPC organizes the API around verbs (remote procedure calls); you invoke named functions and pass arguments. REST is a better fit when modeling entities with standard CRUD and when cachability matters; RPC (gRPC, Thrift, JSON-RPC) is better when the domain is inherently procedural — "transcode this video", "compile this code", "send this email" — where forcing a resource model adds ceremony without clarity.
MEDIUM How do you handle breaking vs non-breaking API changes?
Non-breaking (additive) changes — new optional fields in requests, new fields in responses, new endpoints — can ship without a version bump because existing clients ignore what they don't know about. Breaking changes — removing or renaming fields, changing types, altering required parameters, removing endpoints — require a new major version. Best practice: treat any change that would silently alter behavior as breaking even if the shape is preserved (e.g., changing a field from inclusive to exclusive). Maintain old versions in parallel for a published deprecation window (commonly 6–12 months for public APIs), announce via headers like Deprecation and Sunset (RFC 8594).
MEDIUM What is an OpenAPI spec and why does it matter?
OpenAPI (formerly Swagger) is a machine-readable YAML/JSON description of an HTTP API: its endpoints, parameters, request/response schemas, authentication, and error codes. It matters because it is a single source of truth from which you can automatically generate client SDKs, server stubs, mock servers, interactive docs (Swagger UI), and contract tests. Teams that write the spec before code ("design-first") catch contract issues before a line of implementation is written. It also enables automated breaking-change detection by diffing spec versions.
MEDIUM What is the Webhook pattern and when is it better than polling?
A webhook is a reverse HTTP call: instead of the client polling the server for updates, the server POSTs an event to a client-supplied URL when something happens. Webhooks are more efficient for low-frequency, high-latency events — payment completions, CI build results, repository pushes — because polling burns requests even when nothing changed. Trade-offs: the client must expose a public HTTPS endpoint, deliveries can fail or arrive out of order, and you must implement idempotent event handling. For very high-frequency streams, server-sent events or WebSockets are better still.
MEDIUM What is MIME type and how does it relate to APIs?
A MIME type (now called a media type) is a two-part label — type/subtype — that describes the format of data: application/json, image/png, multipart/form-data. In HTTP APIs, the server declares the response format in Content-Type and the client advertises what it accepts in Accept. Getting MIME types right is critical for correct parsing: a client that receives application/json but parses it as text will fail silently on Unicode escapes; a file-upload endpoint that mismatches Content-Type will reject or corrupt the upload.
MEDIUM What is long polling and how does it differ from WebSockets?
Long polling is a workaround built on plain HTTP: the client makes a request, the server holds it open until an event occurs (or a timeout), then responds, and the client immediately opens a new request. It simulates push using pull with low overhead per message but high connection-management overhead under scale. WebSockets perform a one-time upgrade (101 Switching Protocols) and then communicate over a persistent, full-duplex TCP channel — far more efficient for high-frequency bidirectional messaging like collaborative editing, gaming, or chat. Long polling still wins when WebSocket infrastructure is unavailable or firewall-restricted.
MEDIUM What are the advantages and disadvantages of JSON over Protocol Buffers?
JSON advantages: human-readable, natively supported in every browser and language, no schema required (good for exploratory APIs). JSON disadvantages: verbose (field names repeated in every record), no native binary types, parsing is CPU-heavier than binary formats, and there is no compile-time type enforcement. Protobuf advantages: compact binary encoding (typically 3–10× smaller), schema-enforced types, code-generated serializers that are faster to parse, and first-class support for schema evolution with field numbers. Protobuf disadvantages: not human-readable, requires a toolchain, and field-number changes are breaking. Choose JSON for public APIs and developer experience; Protobuf for internal high-throughput paths.

3. Security

MEDIUM What is the difference between authentication and authorization?
Authentication answers "who are you?" — verifying identity via credentials, tokens, or certificates. Authorization answers "what are you allowed to do?" — deciding whether the authenticated identity has permission for the requested action. In an HTTP API, authentication typically happens via the Authorization header (Bearer token, Basic), and authorization is a gate inside the handler that checks the caller's roles or scopes. Mixing them up leads to security gaps: treating every authenticated user as fully authorized is a privilege-escalation vulnerability (OWASP's BOLA/IDOR).
MEDIUM Explain the OAuth 2.0 Authorization Code flow.
The user's browser hits the authorization endpoint of the identity provider (IdP), logs in, and approves scopes. The IdP redirects back to the client with a short-lived authorization code in the URL. The client's backend exchanges that code (plus a client secret) directly with the IdP's token endpoint for an access token and optionally a refresh token. The code is short-lived and single-use so intercepting it in the browser is useless without the secret. Never put the client secret in a single-page app; use PKCE instead — it replaces the secret with a per-request code verifier that never leaves the device.
MEDIUM What are three common pitfalls with JWT?
(1) Algorithm confusion: the header's alg field is attacker-controlled; libraries that accept alg: none or switch from RS256 to HS256 using the public key as the HMAC secret are trivially exploitable — always whitelist the expected algorithm server-side. (2) Missing expiry check: JWTs don't expire by magic; if the server doesn't validate exp, stolen tokens are permanent. (3) No revocation: unlike opaque session tokens, JWTs are self-contained and valid until expiry even after logout — keep access token lifetimes short (minutes) and implement a refresh-token rotation strategy or a revocation blocklist.
MEDIUM What is CORS and why does it exist?
CORS (Cross-Origin Resource Sharing) is a browser security mechanism that prevents JavaScript on page A from silently reading responses from domain B. Browsers block cross-origin XHR/fetch by default. When a server sets Access-Control-Allow-Origin (and related headers), the browser relaxes this block for that origin. The preflight request (OPTIONS) lets the server declare allowed methods and headers before the main request is sent. CORS is a browser-enforced policy — server-to-server calls ignore it entirely — so it is not a substitute for authentication or authorization on the API itself.
MEDIUM What is TLS and what does it protect against?
TLS (Transport Layer Security) provides confidentiality (payload encryption), integrity (MAC prevents tampering in transit), and server authentication (certificate proves the server owns the domain). It does not protect against bugs in the application layer, stolen tokens, or a compromised server. Common mistakes: accepting self-signed certificates in production (disables authentication), not pinning the certificate authority for high-value internal APIs, and mixing HTTP and HTTPS on the same service (which leaks cookies set without the Secure flag).
MEDIUM What is input validation and why is it an API concern?
Input validation checks that incoming data conforms to expected types, ranges, formats, and sizes before processing it. Without it, APIs are vulnerable to SQL injection, command injection, XML entity attacks, path traversal, and logic bugs from unexpected values (null, negative quantity, 100 000-character string). Validation should happen at the earliest possible layer — ideally via schema-validated deserialization, not scattered conditionals — and errors should return 400 with a machine-readable detail (field name, reason) without revealing internal stack traces.
MEDIUM What is BOLA (Broken Object-Level Authorization)?
BOLA (OWASP API1) occurs when an API accepts an object identifier from the caller and returns or modifies that object without checking whether the caller owns it. Example: GET /v1/orders/8834 returns another user's order if the handler only checks "is the user logged in" rather than "does order 8834 belong to this user." Fix: always scope lookups to the authenticated identity — SELECT * FROM orders WHERE id = :id AND user_id = :caller_id. BOLA is consistently the top vulnerability in REST APIs because object IDs are often sequential and predictable.
MEDIUM What is API key rotation and why does it matter?
API key rotation is the practice of periodically replacing an API key with a new one — either on a schedule or immediately after a suspected compromise. It matters because keys stored in code, logs, CI environments, or configuration files tend to leak over time; rotation limits the blast radius. Best practice: issue keys through a secrets manager with versioning, support overlapping validity windows (both old and new key accepted for a transition period), and log key-level usage so you can detect anomalous access and revoke individual keys without service disruption.
MEDIUM What is the difference between API keys and OAuth access tokens?
An API key is a static, long-lived credential that identifies an application — it doesn't carry per-user context, has no built-in expiry mechanism, and cannot represent delegated permissions. An OAuth access token is issued per user session with explicit scopes, a defined expiry, and a refresh mechanism. Use API keys for machine-to-machine (M2M) or server-to-server calls where there is no user context. Use OAuth for user-delegated access, third-party integrations, and scenarios requiring granular permission scopes. Secrets like API keys must be stored server-side; never embed them in JavaScript bundles or mobile app binaries.
MEDIUM What is a timing attack in the context of authentication?
A timing attack exploits measurable differences in response time to infer secret information. For example, a naive string comparison (token == stored_token) returns early on the first non-matching byte; an attacker who sends thousands of tokens can measure response times to reconstruct a valid token one character at a time. Fix: use a constant-time comparison function (e.g., hmac.compare_digest in Python, crypto.timingSafeEqual in Node.js) for any secret comparison. The same risk applies to checking password hashes — use bcrypt/Argon2 which have inherent constant-time properties.
MEDIUM What is mass-assignment vulnerability and how do you prevent it?
Mass assignment occurs when an API accepts a JSON body and directly binds every field to a model object, allowing callers to set fields they should not control — like isAdmin: true or balance: 1000000. Prevention: use an explicit allow-list of writable fields at the deserialization boundary (an input DTO separate from the domain model) and never pass raw request bodies directly to an ORM. Frameworks like Rails have strong_parameters for this; in typed languages, a separate request schema serves the same purpose.
HIGH How does PKCE improve OAuth security for public clients?
PKCE (Proof Key for Code Exchange, RFC 7636) was designed for native apps and SPAs that cannot securely store a client secret. The client generates a random code_verifier, hashes it to a code_challenge, and sends the challenge with the authorization request. When exchanging the authorization code for tokens, the client sends the original verifier; the auth server hashes it and checks it matches the stored challenge. An attacker who intercepts the authorization code from the redirect URI cannot use it without knowing the verifier, which was never transmitted over any channel the attacker could observe. This makes the authorization code grant safe for public clients.
HIGH What is token audience validation and why is it important?
The aud (audience) claim in a JWT names the intended recipient(s) of the token. Audience validation prevents a token issued for service A from being reused at service B. Without it, a compromised or malicious service can replay valid tokens against other services in the same system. Every resource server must check that its own identifier appears in aud and reject tokens with mismatched audiences with 401. This is frequently omitted in internal microservice setups, creating a lateral-movement risk if any service is compromised.
HIGH How do you protect against request forgery (CSRF) in an API context?
Traditional CSRF attacks forge state-changing requests by exploiting browser automatic cookie attachment. REST APIs using Bearer tokens (sent in Authorization headers) are not vulnerable to CSRF because browsers do not automatically attach arbitrary headers to cross-origin requests. However, APIs that rely on cookie authentication must use CSRF tokens (synchronizer pattern or double-submit cookie) or set cookies with SameSite=Strict/Lax. The clearest mitigation is to move session state from cookies to Authorization headers for API clients entirely.

4. Reliability & Scale

MEDIUM What is an idempotency key and how do you implement it?
An idempotency key is a client-generated unique token (UUID or similar) sent in a request header (e.g., Idempotency-Key: 8f14e45f) that the server uses to detect duplicate requests. On first receipt the server processes the request and stores the response keyed by that value. On any subsequent receipt with the same key it returns the stored response without reprocessing. Implementation: store key → response in a fast store (Redis) with TTL matching the client's retry window; use an atomic check-and-set to handle concurrent duplicates. Stripe's API is the canonical public example — see their idempotency docs.
MEDIUM What are the main rate-limiting algorithms?
Four are commonly asked: (1) Fixed window — counter resets at the start of each clock window; simple but allows a burst at the window boundary (2× rate at the seam). (2) Sliding window — sums requests in the trailing N seconds from now; smoother but heavier to compute exactly (Redis sorted sets work well). (3) Token bucket — a bucket refills at a steady rate up to capacity; clients consume tokens per request; allows controlled bursting up to bucket capacity. (4) Leaky bucket — requests are queued and processed at a constant rate; no bursting, ideal for downstream-rate-limiting. Token bucket is most common in practice for user-facing APIs; leaky bucket for outbound API calls to partners.
MEDIUM Explain the caching layers available to an API.
From closest to farthest from the database: (1) Application-level cache (in-process, e.g., LRU map) — fastest, but per-instance and hard to invalidate. (2) Distributed cache (Redis, Memcached) — shared across instances, supports TTL and pub/sub invalidation. (3) CDN or reverse-proxy cache (Cloudflare, Nginx) — caches at the edge based on HTTP headers (Cache-Control, Vary, ETags); reduces origin load for cacheable GETs. (4) Database query cache — inside the DB; unreliable and deprecated in some engines (MySQL removed it). The hardest problem is invalidation: when data changes, which cache entries are stale? ETags / conditional requests let the client cache and the origin confirm freshness without retransmitting the body.
MEDIUM What is cache invalidation and what are the main strategies?
Cache invalidation removes or updates stale entries when source data changes. Key strategies: (1) TTL expiry — entries expire after a fixed time; simple, but data can be stale up to TTL. (2) Write-through — update the cache on every write to the source; cache stays fresh but writes are slower. (3) Write-behind (write-back) — write to cache first, async persist to DB; fast writes but risk of loss on crash. (4) Event-driven invalidation — on a data-change event, explicitly delete or update affected cache keys; most precise but couples the writer to cache topology. (5) Versioned keys — embed a version or hash in the key; stale entries are unreachable rather than deleted. The right choice depends on how stale data hurts: financial balances need aggressive invalidation; product catalog can tolerate a TTL.
MEDIUM What is exponential backoff with jitter?
Exponential backoff spaces retries by doubling the wait time after each failure (1s, 2s, 4s, 8s …), capped at a maximum. This prevents a thundering-herd storm of simultaneous retries hammering a recovering service. Jitter adds a random offset to each wait (e.g., sleep(min(cap, base * 2^n) * random(0.5, 1.5))), desynchronizing retriers so they don't all hit at the same instant after the cap. AWS's article "Exponential Backoff and Jitter" shows via simulation that full jitter dramatically reduces collision rates under load.
MEDIUM What is the circuit breaker pattern?
A circuit breaker wraps an outbound call and tracks failure rate over a rolling window. When failures exceed a threshold the circuit "opens" and subsequent calls fail fast without attempting the network request, giving the downstream service time to recover. After a configurable timeout the circuit moves to "half-open" — it lets one probe request through. If that succeeds, the circuit "closes" (normal operation resumes); if it fails, the circuit reopens. This prevents cascading failures: a slow or dead dependency no longer ties up threads and exhausts connection pools upstream. Netflix's Hystrix popularised the pattern; Resilience4j and similar libraries implement it.
MEDIUM What is an API gateway and what does it do?
An API gateway is a reverse proxy that sits in front of backend services and provides cross-cutting concerns in one place: authentication/authorization, rate limiting, request routing, protocol translation (REST → gRPC), SSL termination, request/response transformation, logging, and analytics. It reduces the surface area each backend service must handle. Trade-offs: it is a single point of failure (deploy it HA), introduces latency (mitigate with connection pooling and local caching), and can become a bottleneck if it tries to do too much. Examples: Kong, AWS API Gateway, Apigee, Traefik.
MEDIUM What is the difference between horizontal and vertical scaling for an API?
Vertical scaling adds more CPU/RAM to a single server — simple to implement, no code changes, but has a hard ceiling and creates a single point of failure. Horizontal scaling adds more server instances behind a load balancer — theoretically unlimited, more complex but resilient. REST APIs are particularly well-suited to horizontal scaling because statelessness means any instance can handle any request without affinity. The blockers to horizontal scaling are usually stateful dependencies: session stores, database connections, in-memory caches — each must be externalized or shared.
MEDIUM What is a load balancer and what algorithms do they use?
A load balancer distributes incoming requests across backend instances. Common algorithms: Round-robin — cycles through instances in order; simple but ignores load variance. Least-connections — sends to the instance with the fewest active connections; better when requests have variable duration. IP hash / sticky sessions — routes the same client to the same instance; needed for session affinity but undermines true statelessness. Weighted — routes proportionally to instance capacity; useful during canary deployments. Layer-4 load balancers operate on TCP; Layer-7 operate on HTTP and can route on path, header, or host — enabling path-based API routing without DNS changes.
HIGH What are SLIs, SLOs, and error budgets?
An SLI (Service Level Indicator) is a measured metric that reflects the user experience — availability (% of successful requests), latency (p99 response time), throughput (requests/second). An SLO (Service Level Objective) is a target for an SLI: "99.9% of requests under 200 ms over a 28-day window". An error budget is the margin between the SLO and perfection: 99.9% means 43.8 minutes of downtime per month is allowed. If the budget is full the team can deploy freely; when it is exhausted, reliability work takes priority over features. This framework, from the Google SRE book, turns "reliability vs. velocity" into a data-driven trade-off rather than a culture war.
HIGH How would you implement distributed rate limiting across multiple API servers?
Per-instance rate limiting is easy but inconsistent — a client that distributes requests across N instances gets N× the allowed rate. Distributed rate limiting requires shared state. The standard approach: use Redis with atomic Lua scripts or built-in commands (INCR + EXPIRE, or the CL.THROTTLE module) so all servers update and read the same counter. For token-bucket semantics across nodes you can store the last-refill timestamp and remaining tokens in a single Redis key and update atomically. Throughput concern: every API request incurs a Redis RTT; reduce this by batching (pre-claim N tokens at once) or by using a local cache with a short TTL and accepting slight over-quota risk.
HIGH What is a thundering herd and how do you prevent it in an API context?
A thundering herd occurs when a large number of clients simultaneously make the same request — most commonly right after a cache entry expires or a service restarts. Every request goes to the origin simultaneously, overwhelming it. Mitigations: (1) Cache stampede / mutex: when a cache miss occurs, only one process fetches from origin while others wait or receive the stale value (cache-lock pattern). (2) Probabilistic early expiry: start recomputing the cache before it actually expires based on computation cost. (3) Jitter on TTLs: add random variation so large batches of entries don't expire simultaneously. (4) Stale-while-revalidate: serve the stale response immediately while refreshing in the background.
HIGH How does connection pooling help API performance?
Opening a TCP+TLS connection to a database or downstream service takes 50–200 ms. A connection pool maintains a set of pre-established, reusable connections. Incoming requests borrow a connection, use it, and return it to the pool — O(1) overhead instead of per-request handshake cost. Key parameters: min/max pool size (too small = contention; too large = DB connection exhaustion), idle timeout (reclaim connections that haven't been used), and max lifetime (prevents protocol drift and memory leaks in long-running connections). Without a pool, a spike in traffic that multiplies connections can bring down a database faster than the query load itself.
HIGH What is back-pressure and why does it matter for API design?
Back-pressure is a mechanism that slows producers when consumers can't keep up — preventing unbounded queue growth and memory exhaustion. In an API context: if a synchronous endpoint receives more load than it can process, it should return 503 (or 429 with a Retry-After header) rather than queuing indefinitely. For async pipelines, producers should observe queue depth or consumer lag and throttle ingestion. Without back-pressure, a slow downstream causes cascading memory growth upstream, eventually crashing the entire pipeline. gRPC has built-in flow-control at the HTTP/2 stream level; message queues like Kafka expose consumer lag metrics for external back-pressure control.
HIGH Explain the fan-out problem in event-driven APIs.
Fan-out occurs when a single event must be delivered to many consumers — for example, one "order placed" event triggering notifications to 10 000 followers of a seller. Naive sequential processing doesn't scale. Approaches: (1) Push fan-out: on write, immediately publish to every subscriber's queue — O(subscribers) write cost, instant read, but expensive for celebrities with millions of followers. (2) Pull fan-out: write once to a shared timeline; each subscriber fetches their own view on read — cheap writes, expensive reads. (3) Hybrid: push to smaller audiences, pull for high-follower accounts. Twitter's architecture famously pivoted between these; the right choice depends on the write-to-read ratio and follower-count distribution.
HIGH What is the two-generals problem and how does it relate to API reliability?
The two-generals problem is a thought experiment showing that two parties communicating over an unreliable channel can never achieve certainty that a message was received — even acknowledgements can be lost. In distributed API terms: a client can never know with certainty whether a non-idempotent request was processed if the response is lost. The practical mitigation is idempotency keys — the client assigns a unique ID to the intent, not the request. Even if it retries five times, the server only executes the intent once. This reframes the problem: instead of achieving certainty about message delivery, you achieve certainty about exactly-once semantics.

5. Debugging Scenarios

MEDIUM You get a spike of 504 Gateway Timeout errors right after a deploy. What do you check?
A 504 means the gateway (load balancer or API gateway) received no response from the upstream service within the configured timeout. Steps: (1) Check that the new pods/containers are actually healthy — look at health-check endpoints and readiness probe status. A deploy that breaks the health check causes the load balancer to mark instances unhealthy, and requests queue until timeout. (2) Compare p99 response time before and after the deploy — a regression in a hot code path may push responses past the timeout threshold. (3) Check for a database migration or schema change in the deploy that might be taking an exclusive lock and blocking queries. (4) Inspect error logs on the upstream instances — a panic or uncaught exception that crashes the process prevents any response. (5) Roll back and confirm the 504s stop to isolate the deploy as causal.
MEDIUM Users report intermittent 502 Bad Gateway errors (maybe 1–2% of requests). What is your diagnostic approach?
502 Bad Gateway means the gateway received an invalid response from the upstream — often a connection reset. (1) Correlate 502s with instance identity — if errors cluster on one host, look for that instance's memory, CPU, or disk pressure. (2) Check keep-alive timeout mismatches: if the upstream service closes idle connections before the load balancer's keep-alive timeout, the LB reuses a half-closed connection. Fix: set upstream keep-alive timeout slightly higher than the LB's. (3) Look for GC pauses in the upstream — a stop-the-world pause long enough to exceed socket timeouts causes the LB to see a reset. (4) Check connection pool exhaustion — if the pool is saturated, new connections fail with a reset. (5) Review any recent changes to TLS configuration — certificate renewal or cipher-suite mismatches can cause handshake failures that appear as 502s.
MEDIUM The first request after idle is slow (400 ms); subsequent requests are fast (10 ms). What is causing this?
Classic cold-start signature. Possible causes, from most to least likely: (1) Connection establishment — the first request pays TCP + TLS handshake cost because the connection pool is empty after idle. Subsequent requests reuse an established connection. Mitigation: keep-alive, warm-up requests, or proactive pool maintenance. (2) JIT compilation — in JVM or JS runtimes, the first execution of a code path is interpreted; the JIT compiles it after repeated calls. (3) DNS resolution — no cached DNS record, so the first request pays a resolver round-trip. Check by adding a resolve_hosts step in the client. (4) On-demand resource loading — lazy database connection, lazy config load, or lazy authentication middleware on first request.
HIGH Customers report duplicate charges. How do you investigate and prevent recurrence?
Start with data, not assumptions. (1) Pull the payment event log and look for duplicate idempotency keys — if two charges share the same payment-intent ID, the payment processor was called twice. (2) Check client retry logic — a client that retried a POST on 5xx without an idempotency key is the most common cause. (3) Inspect load-balancer logs for the same client-side request ID hitting multiple backend instances simultaneously (double-submit from a UI button click). (4) Look for async job deduplication failures — a queue that delivers the "charge customer" event more than once. Prevention: (a) all charge requests must include an idempotency key, (b) the handler must check whether the key has been processed before executing, (c) button click handlers must disable the button after first submit, (d) async jobs must use at-least-once queues with idempotent handlers.
HIGH You start receiving 429 Too Many Requests from a partner API you call. What do you do?
(1) Read the Retry-After or X-RateLimit-Reset header — the response tells you when the window resets. Don't retry before then. (2) Audit your outbound call rate against the partner's documented quota. Instrument your client with a counter to confirm. (3) Add exponential backoff with jitter to all retry logic — never hammer a rate-limited API at a fixed interval. (4) If you're hitting shared org limits, consider queueing outbound calls through a leaky-bucket dispatcher that self-throttles to the known rate limit. (5) If the calls are parallelized (fan-out), serialize or batch them. (6) Contact the partner to request a higher quota tier if your legitimate usage genuinely exceeds limits. Document the event as a capacity risk item.
HIGH Webhook events from a provider are being silently dropped. How do you diagnose this?
(1) Check the provider's delivery dashboard — most webhook platforms (Stripe, GitHub, Twilio) show delivery attempts, response codes, and retry history per event. Compare their records against your event log. (2) If your endpoint is returning 2xx but you're not seeing events, the issue is in your processing pipeline — check message queue dead-letter queues, async worker logs, and database write failures. (3) If the provider shows delivery failures (5xx or timeout from your endpoint), check: a) your endpoint's average response time (webhook providers time out in 5–10 s; slow handlers cause retries and eventual abandonment), b) whether your endpoint is behind a firewall that blocks the provider's IP range, c) TLS cert expiry on your endpoint. (4) Implement a provider-side event log and replay endpoint so you can backfill missed events once the root cause is fixed.
MEDIUM A client reports random connection reset by peer errors. What do you investigate?
Connection reset means the remote end sent a TCP RST packet — an abrupt close, not a graceful FIN. Most common causes: (1) Keep-alive timeout mismatch between client and server — client sends on a connection the server has already closed; fix by aligning or shortening client keep-alive to be less than server keep-alive. (2) Proxy/load-balancer idle timeout — many cloud LBs reset idle connections after 60 or 300 seconds; if the client hasn't sent traffic in that window, the next attempt gets a RST. (3) Server crash or OOM kill mid-request — OS sends RST on behalf of the dead process. (4) TLS session resumption failure — the client tries to resume a TLS session the server has evicted; the server rejects with a RST. Check TCP metrics, LB idle timeout config, and application crash logs first.
HIGH Latency on one endpoint is fine at p50 but terrible at p99. What does that tell you, and what would you look for?
A large p50–p99 gap means the problem is not a systematic slowdown but a tail phenomenon affecting a minority of requests. Likely causes: (1) Lock contention — most requests complete quickly; a small fraction waits for a database row lock or mutex. Check slow query logs and transaction wait events. (2) GC pauses — stop-the-world events in JVM/Go affect whichever requests are in-flight at that moment. Check GC metrics. (3) Cache misses — most requests hit cache (fast); the cache miss path includes a DB query plus cache fill (slow). (4) Resource pool contention — DB or HTTP connection pool occasionally exhausted; p99 requests waited for a slot. (5) Hot partitions — a subset of keys routes to one shard that is under higher load. Check per-shard latency metrics.
HIGH An integration test passes locally but the same request fails in staging with a 403. How do you investigate?
403 Forbidden means the server understood who you are but denied the action. The request is reaching the application — it's not a network or routing issue. (1) Compare environment-specific configuration: does staging have a different role/permission seed than local? Is the test user in staging missing a required scope or team membership? (2) Check middleware order — a staging-only middleware (IP allow-list, feature flag) may be rejecting before the handler runs. (3) Inspect the token or API key used in staging — it may have different scopes than the local test credential. (4) Look for a tenant or org mismatch — the resource may belong to a different org in staging than in your test fixture. (5) Add detailed authorization logging to the staging service to see the exact permission check that failed.
MEDIUM A new API version is deployed but some clients are still hitting v1 endpoints. What is happening and what do you check?
Some clients haven't updated, which is expected — but if it's clients that should have migrated, investigate: (1) Check the client SDK version in request headers or User-Agent — confirm which version is actually deployed in client environments. (2) Verify the migration cutover date and client rollout tracking. (3) Look for hardcoded v1 URLs in older mobile app versions that can't be force-updated. (4) If using header-based versioning, confirm new gateway routing rules are deployed and that the proxy isn't falling back to v1. (5) Check whether automation or CI scripts still hit v1. Long-term: add a Deprecation header (RFC 8594) to v1 responses with a Sunset date to prompt clients to log or alert on deprecated API usage.
HIGH Your API latency spikes for 10 minutes every hour, then recovers. What patterns should you consider?
Regular, periodic spikes are almost always tied to a scheduled process. (1) Cron jobs — a scheduled report, data sync, or cleanup job running at the top of the hour competes for DB or CPU. Check your job scheduler. (2) Cache TTL expiration — if TTLs were set to 3600 s all at once (e.g., on a deploy), they all expire simultaneously. Jitter TTLs on the next deploy. (3) Log rotation or GC — a heavy GC cycle or log rotation flush triggered by file-size thresholds that happen to align. (4) External service polling — a downstream dependency your service calls has its own scheduled tasks. (5) Rate-limit reset — if you're consuming an external API, the rate-limit window reset causes a burst of queued retries. Correlate timestamps precisely with all scheduled tasks first.
MEDIUM A third-party partner says their API calls to you are failing, but your monitoring shows 0 errors. What could explain this?
The most common explanation: the requests are failing before they reach your service. (1) The partner may be hitting the wrong URL or IP — check if they're hitting a staging endpoint or an old DNS record that no longer resolves to your infra. (2) A firewall, WAF, or DDoS protection layer in front of your API is blocking or silently dropping their requests — check cloud WAF logs, Cloudflare/Fastly logs. (3) TLS handshake failure — if the partner's TLS library doesn't support your cipher suite or certificate chain, the connection fails before HTTP. (4) If you have IP allow-listing, confirm the partner's egress IPs are permitted. (5) Ask the partner for the raw TCP-level error and the full request including headers — the error message often points directly at the layer causing the failure.

6. System & API Design

HIGH Design a rate limiter API/service. Walk me through resources, storage, and the key algorithm.
Scope: Per-client rate limiting enforced at an API gateway, target 1 000 000 active clients.

Algorithm: Token bucket per client. Stored as { tokens: float, last_refill_ts: timestamp } in Redis with a 24h TTL.

Key flow: On each request: (1) fetch the bucket for the client key, (2) compute tokens added since last_refill_ts (rate × elapsed), (3) clamp to bucket max, (4) if tokens ≥ cost, subtract and update; return 200. Else return 429 with Retry-After. Steps 1–4 are a single Lua script to prevent races.

Resources (if exposed as an API): GET /limits/{client_id} → current bucket state. PUT /limits/{client_id}/config → update rate/capacity per client (admin).

Key trade-off: Token bucket allows bursting up to capacity, which suits API consumers. Leaky bucket would give a smoother outbound rate to a downstream but feels punishing to callers. Distributed deployment: all gateway nodes hit the same Redis; RTT to Redis adds ~0.5 ms. Pre-claim N tokens per request to amortize Redis calls under high throughput.
HIGH Design an idempotent payments API. What are the critical design decisions?
Core resource: PaymentIntent. A client creates an intent first (not a charge directly) — this separates "expressing the desire to pay" from "executing the payment".

Endpoints: POST /v1/payment-intents body: { amount, currency, customer_id } → returns { id, status: "pending" }. Requires Idempotency-Key header. POST /v1/payment-intents/{id}/confirm → transitions to "processing" then "succeeded/failed". Also requires Idempotency-Key. GET /v1/payment-intents/{id} → idempotent by nature.

Idempotency store: Redis: key → {response, status, created_at} with TTL = 24h. Handler checks store atomically before calling the payment processor.

State machine: pending → processing → succeeded | failed | refunded. No terminal state can transition back. Prevent double charges by refusing to re-enter "processing" from "succeeded".

Key trade-off: Two-phase (create + confirm) vs single-step charge. Two-phase costs an extra RTT but allows pre-authorization checks, 3DS, and UI confirmation — worth it for real-money flows.
HIGH Design a URL shortener API. What are the endpoints, the encoding scheme, and the scaling challenges?
Endpoints: POST /links body: { long_url, custom_slug? } → { short_code, short_url } GET /{short_code} → 301/302 redirect to long_url (cached at CDN) DELETE /links/{short_code} → deactivate GET /links/{short_code}/stats → click analytics

Encoding: Generate a random 7-char Base62 string (62^7 ≈ 3.5 trillion codes). Avoid sequential IDs (predictable enumeration). Store {short_code, long_url, owner, created_at, click_count} in a database indexed on short_code.

Scaling challenges: (1) Redirect latency: cache short_code → long_url at the CDN edge; 301 (permanent) caches in browser forever — great for throughput, terrible if you need to update the destination. Use 302 if mutability matters. (2) Write throughput: short-code generation must be collision-safe across distributed writers — use optimistic locking on insert or a pre-generated ID pool. (3) Analytics at scale: increment click counters asynchronously via a queue; counting every click synchronously at DB level creates hot-row contention.
HIGH Design a notification fan-out service. How does your API handle millions of subscribers?
Endpoints (producer side): POST /v1/events body: { event_type, producer_id, payload } → { event_id }. Producers publish events; the service fans out to subscribers. POST /v1/subscriptions → { subscription_id } DELETE /v1/subscriptions/{id}

Fan-out flow: Event lands on a Kafka/SQS topic. A fan-out worker reads the event and looks up subscriber IDs. For small subscriber counts (< 1000): push task per subscriber into individual delivery queues. For large counts (celebrities): write event ID to a shared timeline store (pull fan-out); subscribers read their timeline and fetch event detail on demand.

Delivery guarantees: At-least-once via a dead-letter queue for failed deliveries. Subscribers implement idempotent handlers (check event_id before processing).

Key trade-off: Push fan-out gives low read latency but amplifies writes (10M subscribers = 10M writes). Pull fan-out is write-cheap but every subscriber read must merge their subscriptions' timelines. A hybrid threshold (push for < 10K followers, pull otherwise) is the production answer.
HIGH Design a real-time chat API. What transport, resources, and consistency model do you choose?
Transport: WebSocket for client↔server real-time delivery (full-duplex, low latency). Fallback to long-polling for environments that block WS.

Resources: POST /v1/conversations → creates conversation, returns { id } POST /v1/conversations/{id}/messages → sends a message; also triggers push to all connected WebSocket clients in the conversation GET /v1/conversations/{id}/messages?before_id=&limit=50 → cursor paginated history WS channel: wss://api/v1/realtime — client subscribes to conversation IDs and receives new-message events

Consistency model: Eventual — a message is written to the DB, then fanned out to WebSocket subscribers. The DB record is the source of truth; WS delivery is best-effort. Clients that were offline re-sync by polling REST history on reconnect.

Key trade-off: Strong ordering within a conversation requires a sequence counter (global or per-conversation) stored atomically in the DB. This creates a write bottleneck for high-volume channels — partition by conversation_id and accept per-partition ordering rather than global.
HIGH Design pagination for a feed with billions of rows. Why does offset fail and what replaces it?
Why offset fails: OFFSET 1000000 LIMIT 20 requires the database to scan and discard 1 000 000 rows before returning 20. At scale this is seconds of CPU and disk I/O per request. Additionally, concurrent inserts/deletes shift rows, causing skipped or duplicated items across pages.

Cursor-based replacement: Use the last seen item's sort key as the cursor. For a reverse-chronological feed sorted by (created_at DESC, id DESC), the next-page query is: SELECT * FROM posts WHERE (created_at, id) < (:cursor_ts, :cursor_id) ORDER BY created_at DESC, id DESC LIMIT 20 This is a range scan on the index — O(1) in offset regardless of how deep in the feed.

API design: Response includes "next_cursor": "2024-06-20T12:34:56Z_88742" (opaque to the client). Client sends ?cursor=…. Never expose raw DB IDs in cursors — encode and sign them to prevent tampering.

Trade-off: No arbitrary page jumping (can't go to "page 47"); only sequential forward/backward navigation. Acceptable for infinite-scroll feeds; unacceptable for admin search results that need jump-to-page.
HIGH Design a file-upload API for large files (up to 5 GB). What are the key considerations?
Never stream through your API server for large files. Instead: (1) Client requests an upload session from your API — POST /v1/upload-sessions → returns { upload_url, session_id, expires_at }. The upload_url is a pre-signed S3/GCS URL valid for 15 minutes. (2) Client uploads directly to object storage at the pre-signed URL — your API server is out of the data path. (3) Object storage fires a completion event (S3 Event Notification) to your backend, which creates the database record.

Resumable uploads: For files over ~100 MB, use multipart upload. Client splits file into chunks (5–100 MB), uploads each part, then sends POST /v1/upload-sessions/{id}/complete with the list of part ETags. If a chunk fails, only that chunk retries.

Security: Scope the pre-signed URL to the exact object key; set Content-Length and Content-Type constraints in the signature to prevent payload substitution.

Key trade-off: Direct-to-storage bypass requires the client to know about your object storage; adds a presign step RTT. The alternative (streaming through your server) is simpler but makes your API tier a bandwidth bottleneck that doesn't scale.
HIGH Design a search API. How do you handle ranking, pagination, and latency?
Endpoint: GET /v1/search?q=…&type=product&filters[category]=electronics&sort=relevance&limit=20&cursor=… Returns: { results: [...], next_cursor, total_count_estimate, took_ms }

Backend: Queries a search index (Elasticsearch, OpenSearch, Typesense) rather than the primary DB. The index is kept in sync via an event stream from the write path.

Ranking: Default is relevance score (BM25 + semantic embedding similarity for ML-augmented search). Expose sort parameters for price, recency, popularity; never expose internal score to callers.

Pagination: Use Elasticsearch's search_after (cursor-based) rather than from/size for deep pages — avoids the deep-pagination performance cliff. Return an opaque cursor.

Latency budget: p99 target ≤ 150 ms. Strategies: cache top-N results for high-frequency queries (Redis with short TTL), pre-warm index shards, limit result window (reject requests with cursor > 10 000 deep), and return estimated total count rather than exact count (exact count requires full index scan).
HIGH Design an API for a multi-tenant SaaS product. How do you enforce tenant isolation?
Authentication layer: Every token carries a tenant_id (org/workspace ID) in its claims. The API gateway or authentication middleware extracts and validates this before any handler runs.

Data isolation strategies: (1) Row-level isolation: single shared schema, every table has a tenant_id column; all queries include AND tenant_id = :current_tenant. Simple and cheap but a missing predicate leaks cross-tenant data. Use PostgreSQL Row Level Security (RLS) policies to enforce at the DB layer, making the isolation bypass-proof even for buggy queries. (2) Schema-per-tenant: each tenant has its own schema; queries are routed to the right schema. Stronger isolation, harder to query across tenants for analytics. (3) DB-per-tenant: strongest isolation, independent scaling and backup per tenant; operationally expensive.

Key trade-off: Row-level is cheapest but requires discipline; DB-per-tenant is most compliant (GDPR data deletion is trivially "drop database") but scales to ~thousands of tenants before cost is prohibitive.
HIGH How would you design a public API that needs to support both mobile apps and browser SPAs with different data requirements?
Three main approaches: (1) Backend For Frontend (BFF): build a thin API layer per client type — a mobile BFF and a web BFF — each tailored to its client's needs; both back onto shared core services. Clients never hit core services directly. Downsides: two APIs to maintain. (2) GraphQL: a single schema exposes all data; each client queries exactly what it needs. Solves over- and under-fetching elegantly. Downsides: HTTP caching is hard; batched queries can stress the backend with N+1 DB hits. (3) Field-selection on REST: add a ?fields=id,name,avatar parameter to let clients trim responses. Simple and cache-friendly but doesn't handle deeply nested needs.

Recommendation in an interview: propose BFF for large organizations (each team owns its BFF, reducing coupling) and GraphQL for mid-size products with diverse clients but a unified backend team. Combine with a persistent query layer or query complexity limits to prevent abuse.
HIGH How do you ensure backward compatibility when evolving an API schema over years?
Backward compatibility is a policy, not a technique. The policy: never break existing callers without a versioned migration path. Practical rules: (1) New fields in responses are always optional — callers ignore unknown fields. (2) New fields in requests are always optional with sensible defaults — old callers don't send them. (3) Never remove or rename existing fields — add a new field, deprecate the old one with a deprecated: true annotation in the schema and a Deprecation header, then remove after the sunset date. (4) Never change the type or semantic of an existing field (e.g., changing count from item count to page count). (5) Distinguish resource-level versioning (new version of a specific resource type) from API-level versioning (entire API bumps). (6) Contract-test every release: Pact or similar consumer-driven contract tests catch breaking changes in CI before they reach production.
HIGH Design a webhook delivery system that guarantees at-least-once delivery with reasonable retry semantics.
Architecture: When an event occurs, a record is written to a webhook_events table (or a Kafka topic) before returning success to the caller — delivery is decoupled from the write path. A worker pool reads from this queue and dispatches HTTP POST requests to subscriber URLs.

Retry policy: Exponential backoff: retry after 30 s, 5 min, 30 min, 2 h, 24 h. After 5 failures, mark as dead and alert the subscriber. Honor Retry-After headers from the destination.

Signature: Sign the payload with HMAC-SHA256 using a per-subscriber secret and include the signature in a header (e.g., X-Hook-Signature: sha256=…). Subscribers verify before processing — prevents spoofing and replay attacks.

Idempotency: Include a stable event_id in every payload; advise subscribers to deduplicate on it. The same event retried 5 times carries the same ID.

Key trade-off: At-least-once means subscribers may see duplicate events during retries. The alternative (exactly-once) requires a distributed transaction spanning your DB and the subscriber's endpoint — generally not worth it. Design subscribers to be idempotent instead.