Interview Prep · Lesson 07

Inside Stripe's API — a deep dive

Stripe's API is the one interviewers reach for as the gold standard, so it pays to know not just what it does but how. This is the mechanism-level walkthrough of three things Stripe is famous for getting right: rate limiting, date-based versioning, and idempotency. All of it is from Stripe's public docs and engineering talks (cited below); the explanations and traces are original.

⏱ 20 minDifficulty: advancedPrereq: rate limiting, versioning, idempotency

By the end you'll be able to

Explain Stripe's token-bucket limiter and its four cooperating limiters, with the bucket math.
Describe the version-change transformation layer that makes date pinning actually work.
Walk the idempotency-key store through first call, replay, concurrent duplicate, and body mismatch.

1. Rate limiting — a token bucket, plus three more

Stripe's everyday limiter is a token bucket per account, kept in Redis. Picture a bucket that holds at most B tokens and is topped up at a steady rate of r tokens per second. Every API request must remove one token; if the bucket is empty, the request is rejected with 429. The bucket model is what lets a client burst (spend the full bucket quickly) while still being held to an average of r requests/second over time.

Tokens refill continuously; each request spends one; an empty bucket yields 429 with Retry-After. Capacity B sets the burst size; rate r sets the sustained throughput.

Why Redis? Many API servers handle one account's traffic concurrently, so the bucket must be a single shared counter updated atomically — otherwise two servers both see "1 token left" and both allow a request. Stripe runs the check-and-decrement as one atomic Redis operation (a small server-side script) so the decision is race-free across the whole fleet. Here is the logic, and a worked trace:

# Atomic in Redis: refill based on elapsed time, then try to spend one token now = current_time_seconds() elapsed = now - bucket.last_refill tokens = min(B, bucket.tokens + elapsed * r) # lazy refill if tokens >= 1: bucket.tokens = tokens - 1 # consume bucket.last_refill = now return ALLOW else: retry_after = ceil((1 - tokens) / r) # seconds until 1 token return DENY(429, retry_after) # Trace with B=5, r=1 token/sec, bucket starts full: t=0.0 burst of 5 requests → all ALLOW (tokens 5→0) t=0.1 6th request → 429, Retry-After: 1 (tokens ~0) t=2.0 request → ALLOW (refilled ~2 → spend 1 → ~1)

Stripe runs four limiters, not one

A single per-account rate limiter isn't enough at Stripe's scale. They described four cooperating mechanisms, each defending against a different failure mode:

Limiter	What it caps	Failure mode it prevents
1. Request rate limiter	Requests/sec per account (the token bucket above)	One account monopolising throughput; the everyday fairness limit.
2. Concurrent request limiter	Number of simultaneously in-flight requests per account	A handful of slow, expensive calls (big list queries) tying up workers even though the req/sec rate looks fine.
3. Fleet usage load shedder	Reserves a fraction of total fleet capacity for critical request types	A flood of non-critical traffic (e.g. listing objects) starving the critical money path (creating/capturing charges) during a surge.
4. Worker utilization load shedder	Sheds low-priority traffic when workers are near saturation	Total overload taking the whole fleet down — it degrades gracefully by dropping the least important work first.

The mental model to carry into an interview: limiters 1–2 enforce fairness between accounts; limiters 3–4 protect the system as a whole and prioritise the most important work when capacity runs short. When asked "how would you rate-limit a payments API?", that two-tier answer — fairness plus prioritised load shedding — is what separates a senior response from "use a token bucket."

✅ Client behaviour Stripe expects

On a 429, read Retry-After and back off — ideally exponential backoff with jitter (see retries & backoff). Stripe's own SDKs retry safely because write calls carry an idempotency key (section 3), so a retried create can't double-charge.

2. Date versioning — the version-change transformation layer

Everyone repeats the headline: Stripe versions are dates (2024-06-20), and your account is pinned to the version current when you signed up. But that's the policy, not the mechanism. The interesting question is: how does one codebase serve dozens of old versions without drowning in if version < X branches everywhere? The answer is the part most people don't know.

Stripe keeps one current internal representation of every object — the code always works with "latest." For each breaking change ever made, they write a small, self-contained version change: a module that knows how to transform a response (and, where needed, request handling) between two adjacent versions. At request time, the response is produced in the latest shape and then run backwards through the chain of version changes until it matches the caller's pinned version.

Core code stays on "latest." Old callers get their shape by replaying version-change transforms in reverse — each change is small, isolated, and individually testable.

Concretely, suppose a version change renamed a field and changed a default. A single version-change module captures exactly that delta:

# A version change is a small, isolated transform between two adjacent versions. VersionChange "2024-06-20": description: "Renamed `card` to `payment_method`; `captured` now defaults to true" transform_response(obj, to_older): # when morphing a LATEST response back to the PREVIOUS version: obj["card"] = obj.pop("payment_method") # restore the old field name obj.pop("captured", None) # field didn't exist before # Request flow for an account pinned to an OLDER version: 1. controller builds the response in LATEST shape { payment_method, captured } 2. apply each version change between LATEST and pinned, newest-first 3. caller receives exactly the OLD shape { card }

Why this design wins:

The core code never accumulates version branches — it always speaks "latest." Complexity lives in small, named, testable change modules.
Adding a version is a single, reviewable unit — write one version change describing the delta; it becomes the new "latest" and every older caller is unaffected.
Additive changes need no version at all — a new optional field just appears; only breaking changes get a new date. (Tie this back to evolving APIs: tolerant readers + additive-by-default.)

Operationally: clients can read their current version and override it per request with the Stripe-Version header; SDKs pin a version so an SDK upgrade can't silently change behaviour; upgrading is a deliberate action (in the Dashboard) after reading the changelog of which version changes you'll cross.

🎯 Interview angle

If asked "how would you let an API evolve for a decade without breaking anyone?", describe this exact mechanism: keep the core on latest, express each breaking change as an isolated transform, and replay transforms backward to each caller's pinned version. It demonstrates you understand that the hard part of versioning isn't choosing URL-vs-header — it's keeping the implementation maintainable across many live versions.

3. Idempotency — the key store, step by step

A network timeout never tells the client whether the charge succeeded, so clients must be able to retry safely. Stripe's answer: the client generates an idempotency key (a UUID) and sends it as the Idempotency-Key header on a POST. The server records the outcome against that key and replays it for any retry within ~24 hours, so a retried "create charge" creates one charge.

The store holds, per key: a fingerprint of the request (a hash of the params), a status (in-progress / complete), and the stored response. Four cases fall out of that, and knowing all four is the deep part:

# Server logic keyed by Idempotency-Key K (insert-if-absent is ATOMIC) row = store.insert_if_absent(K, status="in_progress", fingerprint=hash(params)) CASE A — new key: row created → process the charge → save response → return 200 CASE B — key seen, done: stored fingerprint == hash(params) → replay stored response, add Idempotent-Replayed: true (no new charge) CASE C — key in-flight: a concurrent duplicate is still processing → return 409 "a request with this Idempotency-Key is already in progress" CASE D — key reused, stored fingerprint != hash(params) different body: → return 4xx — the client reused a key for a DIFFERENT request (a bug)

Walk the common path as a trace:

$ POST /v1/charges Idempotency-Key: 9f1c… amount=2000 currency=usd 201 Created { "id": "ch_3Abc", "amount": 2000, ... } # CASE A # …client's network drops the response, so it retries the identical request… $ POST /v1/charges Idempotency-Key: 9f1c… amount=2000 currency=usd 200 OK { "id": "ch_3Abc", ... } Idempotent-Replayed: true # CASE B — SAME charge

Design details worth stating: keys are scoped per-account and expire after ~24 hours (the store isn't infinite); idempotency is for POST (creating things) because GET/PUT/DELETE are already idempotent by HTTP semantics; and the body-fingerprint check (Case D) catches the classic bug of generating one key and reusing it across different operations. (See the mechanism foundations in idempotency and idempotency in practice.)

How to debug these as a caller

Symptom	Likely cause	What to do
`429` with `Retry-After`	Token bucket empty for your account	Back off for the stated seconds (exponential + jitter); batch/cache to cut call volume
A field you expected is missing/renamed	Your account is pinned to an older version	Check your pinned version; read the changelog of version changes before upgrading
SDK behaves differently after an upgrade	SDK bumped its pinned API version	Pin the version explicitly; review crossed version changes
A retry created a duplicate charge	No `Idempotency-Key` on the original	Always send a key on writes; reuse the same key when retrying the same logical request
`409` "already in progress"	A concurrent duplicate of the same key is mid-flight	Wait and retry; don't fire parallel requests with the same key

🧠 Quick check

1. Why does Stripe run the token-bucket check as a single atomic Redis operation?

The bucket is a shared counter across many servers. Check-and-decrement must be atomic, or two servers could each read "1 left" and both allow — over-admitting traffic.

2. How does Stripe serve many old API versions without version branches all over the core code?

Each breaking change is an isolated, testable transform. A response is built in the latest shape, then replayed backward through the version changes down to the caller's pinned version — so the core never accumulates conditionals.

3. A client sends the same Idempotency-Key but with a different request body. Stripe responds with an error because:

The store keeps a fingerprint (hash) of the original request. A same-key/different-body request signals the caller reused a key across logical operations, so the server surfaces it as an error rather than silently replaying the wrong response.

4. Which limiter stops a flood of cheap "list" calls from starving the "create charge" path during a surge?

The fleet usage load shedder reserves a slice of total capacity for critical requests, so non-critical traffic can't consume everything and block the money path. The token bucket enforces per-account fairness; concurrency caps in-flight count.

Key takeaways

Rate limiting: a per-account token bucket in Redis (atomic check-and-decrement), plus a concurrency limiter and two load shedders that protect critical traffic under overload.
Versioning: dates + account pinning, made maintainable by a version-change transformation layer — core code stays on latest; responses are morphed backward to the caller's pinned version.
Idempotency: client Idempotency-Key + a server store of fingerprint + status + response; four cases — new, replay, in-flight (409), and body-mismatch (error) — over a ~24h window.
The unifying theme: these three let Stripe stay safe to retry, safe to evolve, and stable under load simultaneously — exactly what a payments API needs.

Sources & further reading

Stripe — Scaling your API with rate limiters (the four limiters + token bucket)
Stripe — APIs as infrastructure: future-proofing Stripe with versioning (the version-change layer)
Stripe docs — Versioning · Idempotent requests · Rate limits
Compare across vendors: How leading APIs do it