Interview Prep · Lesson 07
Inside Stripe's API — a deep dive
Stripe's API is the one interviewers reach for as the gold standard, so it pays to know not just what it does but how. This is the mechanism-level walkthrough of three things Stripe is famous for getting right: rate limiting, date-based versioning, and idempotency. All of it is from Stripe's public docs and engineering talks (cited below); the explanations and traces are original.
By the end you'll be able to
- Explain Stripe's token-bucket limiter and its four cooperating limiters, with the bucket math.
- Describe the version-change transformation layer that makes date pinning actually work.
- Walk the idempotency-key store through first call, replay, concurrent duplicate, and body mismatch.
1. Rate limiting — a token bucket, plus three more
Stripe's everyday limiter is a token bucket per account, kept in Redis. Picture a bucket that holds at most B tokens and is topped up at a steady rate of r tokens per second. Every API request must remove one token; if the bucket is empty, the request is rejected with 429. The bucket model is what lets a client burst (spend the full bucket quickly) while still being held to an average of r requests/second over time.
Why Redis? Many API servers handle one account's traffic concurrently, so the bucket must be a single shared counter updated atomically — otherwise two servers both see "1 token left" and both allow a request. Stripe runs the check-and-decrement as one atomic Redis operation (a small server-side script) so the decision is race-free across the whole fleet. Here is the logic, and a worked trace:
Stripe runs four limiters, not one
A single per-account rate limiter isn't enough at Stripe's scale. They described four cooperating mechanisms, each defending against a different failure mode:
| Limiter | What it caps | Failure mode it prevents |
|---|---|---|
| 1. Request rate limiter | Requests/sec per account (the token bucket above) | One account monopolising throughput; the everyday fairness limit. |
| 2. Concurrent request limiter | Number of simultaneously in-flight requests per account | A handful of slow, expensive calls (big list queries) tying up workers even though the req/sec rate looks fine. |
| 3. Fleet usage load shedder | Reserves a fraction of total fleet capacity for critical request types | A flood of non-critical traffic (e.g. listing objects) starving the critical money path (creating/capturing charges) during a surge. |
| 4. Worker utilization load shedder | Sheds low-priority traffic when workers are near saturation | Total overload taking the whole fleet down — it degrades gracefully by dropping the least important work first. |
The mental model to carry into an interview: limiters 1–2 enforce fairness between accounts; limiters 3–4 protect the system as a whole and prioritise the most important work when capacity runs short. When asked "how would you rate-limit a payments API?", that two-tier answer — fairness plus prioritised load shedding — is what separates a senior response from "use a token bucket."
On a 429, read Retry-After and back off — ideally exponential backoff with jitter (see retries & backoff). Stripe's own SDKs retry safely because write calls carry an idempotency key (section 3), so a retried create can't double-charge.
2. Date versioning — the version-change transformation layer
Everyone repeats the headline: Stripe versions are dates (2024-06-20), and your account is pinned to the version current when you signed up. But that's the policy, not the mechanism. The interesting question is: how does one codebase serve dozens of old versions without drowning in if version < X branches everywhere? The answer is the part most people don't know.
Stripe keeps one current internal representation of every object — the code always works with "latest." For each breaking change ever made, they write a small, self-contained version change: a module that knows how to transform a response (and, where needed, request handling) between two adjacent versions. At request time, the response is produced in the latest shape and then run backwards through the chain of version changes until it matches the caller's pinned version.
Concretely, suppose a version change renamed a field and changed a default. A single version-change module captures exactly that delta:
Why this design wins:
- The core code never accumulates version branches — it always speaks "latest." Complexity lives in small, named, testable change modules.
- Adding a version is a single, reviewable unit — write one version change describing the delta; it becomes the new "latest" and every older caller is unaffected.
- Additive changes need no version at all — a new optional field just appears; only breaking changes get a new date. (Tie this back to evolving APIs: tolerant readers + additive-by-default.)
Operationally: clients can read their current version and override it per request with the Stripe-Version header; SDKs pin a version so an SDK upgrade can't silently change behaviour; upgrading is a deliberate action (in the Dashboard) after reading the changelog of which version changes you'll cross.
If asked "how would you let an API evolve for a decade without breaking anyone?", describe this exact mechanism: keep the core on latest, express each breaking change as an isolated transform, and replay transforms backward to each caller's pinned version. It demonstrates you understand that the hard part of versioning isn't choosing URL-vs-header — it's keeping the implementation maintainable across many live versions.
3. Idempotency — the key store, step by step
A network timeout never tells the client whether the charge succeeded, so clients must be able to retry safely. Stripe's answer: the client generates an idempotency key (a UUID) and sends it as the Idempotency-Key header on a POST. The server records the outcome against that key and replays it for any retry within ~24 hours, so a retried "create charge" creates one charge.
The store holds, per key: a fingerprint of the request (a hash of the params), a status (in-progress / complete), and the stored response. Four cases fall out of that, and knowing all four is the deep part:
Walk the common path as a trace:
Design details worth stating: keys are scoped per-account and expire after ~24 hours (the store isn't infinite); idempotency is for POST (creating things) because GET/PUT/DELETE are already idempotent by HTTP semantics; and the body-fingerprint check (Case D) catches the classic bug of generating one key and reusing it across different operations. (See the mechanism foundations in idempotency and idempotency in practice.)
How to debug these as a caller
| Symptom | Likely cause | What to do |
|---|---|---|
429 with Retry-After | Token bucket empty for your account | Back off for the stated seconds (exponential + jitter); batch/cache to cut call volume |
| A field you expected is missing/renamed | Your account is pinned to an older version | Check your pinned version; read the changelog of version changes before upgrading |
| SDK behaves differently after an upgrade | SDK bumped its pinned API version | Pin the version explicitly; review crossed version changes |
| A retry created a duplicate charge | No Idempotency-Key on the original | Always send a key on writes; reuse the same key when retrying the same logical request |
409 "already in progress" | A concurrent duplicate of the same key is mid-flight | Wait and retry; don't fire parallel requests with the same key |
🧠 Quick check
1. Why does Stripe run the token-bucket check as a single atomic Redis operation?
The bucket is a shared counter across many servers. Check-and-decrement must be atomic, or two servers could each read "1 left" and both allow — over-admitting traffic.
2. How does Stripe serve many old API versions without version branches all over the core code?
Each breaking change is an isolated, testable transform. A response is built in the latest shape, then replayed backward through the version changes down to the caller's pinned version — so the core never accumulates conditionals.
3. A client sends the same Idempotency-Key but with a different request body. Stripe responds with an error because:
The store keeps a fingerprint (hash) of the original request. A same-key/different-body request signals the caller reused a key across logical operations, so the server surfaces it as an error rather than silently replaying the wrong response.
4. Which limiter stops a flood of cheap "list" calls from starving the "create charge" path during a surge?
The fleet usage load shedder reserves a slice of total capacity for critical requests, so non-critical traffic can't consume everything and block the money path. The token bucket enforces per-account fairness; concurrency caps in-flight count.
Key takeaways
- Rate limiting: a per-account token bucket in Redis (atomic check-and-decrement), plus a concurrency limiter and two load shedders that protect critical traffic under overload.
- Versioning: dates + account pinning, made maintainable by a version-change transformation layer — core code stays on latest; responses are morphed backward to the caller's pinned version.
- Idempotency: client
Idempotency-Key+ a server store of fingerprint + status + response; four cases — new, replay, in-flight (409), and body-mismatch (error) — over a ~24h window. - The unifying theme: these three let Stripe stay safe to retry, safe to evolve, and stable under load simultaneously — exactly what a payments API needs.
Sources & further reading
- Stripe — Scaling your API with rate limiters (the four limiters + token bucket)
- Stripe — APIs as infrastructure: future-proofing Stripe with versioning (the version-change layer)
- Stripe docs — Versioning · Idempotent requests · Rate limits
- Compare across vendors: How leading APIs do it