Platform & API Product Engineering · Lesson 09

Capstone: design a developer platform (Stripe/HubSpot-style)

Every concept in this module — app models, key rotation, scoped auth, nested rate limits, webhook delivery, error envelopes, tenant isolation, usage metering — was a separate instrument. This lesson assembles them into one instrument that plays. You will design a production-grade developer platform from requirements through trade-offs to a concrete API surface and the numbers that show where it breaks.

⏱ 35 min Difficulty: advanced Prereq: All plat-01 through plat-08 lessons

By the end you'll be able to

State the complete functional and non-functional requirements for a multi-app developer platform and explain which non-functional requirement is hardest to satisfy simultaneously with the others.
Justify each major design decision — key model, auth, rate limiting strategy, webhook subsystem, error model, metering — in terms of the trade-offs between consistency, throughput, and operational complexity.
Sketch the architecture diagram from memory, trace a single API call end-to-end through every layer, and calculate which limit binds first given concrete load numbers.

1 — Requirements

Good design begins with precise requirements. Vague requirements produce systems that satisfy no one; over-specified requirements close off the right trade-offs. Start by separating what the platform must do from what constraints bound how it does it.

Functional requirements

App management: each account can create up to 20 private apps. A private app is an isolated principal within the account — it has its own identity, its own keys, and its own permission surface. Apps cannot see each other's keys or webhook history.
API key lifecycle: every app has at least one active API key at all times. Keys can be created, rotated (overlap period during which old and new key both work), and revoked. A revoked key must be rejected within a bounded propagation window, not eventually.
Webhook subscriptions: each app subscribes to a named set of event topics (e.g. contact.created, deal.stage_changed). The platform delivers signed payloads to one or more HTTPS endpoints registered per app. Subscriptions are scoped — an app that does not subscribe to billing.invoice.paid never receives that event, even if the account generates it.
Scopes / permissions: each app is granted a set of OAuth-style scopes at install time. A key from an app that holds crm.contacts:read but not crm.contacts:write is rejected on mutation attempts — not just for billing, but at the authorization layer.
Per-app rate limit: the platform enforces a request rate limit at app granularity. A runaway app cannot exhaust the account's capacity.
Per-account daily cap: in addition to per-app rate limits, each account has a hard daily API call cap that aggregates across all its apps. Once the cap is reached, all apps in that account receive 429 until the UTC-midnight reset.
Reliable webhook delivery: events are delivered at least once. Non-200 responses and timeouts trigger retries with exponential backoff. Deliveries are signed so recipients can verify authenticity without trusting the source IP.
Consistent error model: every error — auth failure, rate limit, validation, internal fault — uses the same JSON envelope. Clients parse one shape, not a different one per error class.
Usage metering: every billable API call is recorded for downstream billing. The metering record is written as part of the request path — not asynchronously after the fact — so billing is exact, not approximate.

Non-functional requirements

Constraint	Target	Why it matters
Accounts	50,000 active accounts	Drives key-store cardinality and tenant isolation cost
Apps per account	Up to 20 (hard cap)	Up to 1,000,000 apps total; affects rate-limiter key space
API call volume	100 M calls / day (~1,160 req/s avg; ~8,000 req/s peak at 7× daily load factor)	Gateway CPU, rate-limit store throughput
Webhook delivery volume	20 M deliveries / day (~230 deliveries/s avg)	Queue depth, worker pool sizing, retry amplification
Key lookup latency	< 2 ms p99 added by auth layer	Key validation sits on every request's critical path
Rate-limit decision latency	< 1 ms p99	Two round trips to Redis at 0.3 ms each + decision = ~1 ms
Webhook delivery latency	< 30 s for first attempt (p95)	First attempt defines perceived responsiveness for integrations
Metering write durability	Zero loss; synchronous before response	Billing disputes require an exact count, not an estimate
Daily-cap reset window	UTC midnight; exact, not approximate	Predictable for customers who schedule batch jobs at 00:01 UTC

🎯 Interview angle — the hardest non-functional requirement

Interviewers often ask "which requirement is hardest to satisfy together with the rest?" The answer here is synchronous metering durability at 8,000 req/s peak. Writing a durable metering record on every request's hot path means you need a write path that is (a) fast enough to not blow the <1 ms rate-limit budget, (b) durable enough that you can bill from it, and (c) cheap enough to run at 100 M records/day. The solution — covered in plat-08 — is a hybrid: increment an atomic counter in Redis synchronously (fast, cheap), and flush counter snapshots to a durable store asynchronously in batches. The counter is the source-of-truth for real-time enforcement; the durable store is the source-of-truth for invoicing. You trade a small flush-delay billing window for an acceptable write latency on the hot path.

2 — Design decisions

Each decision below is a module concept applied to the specific requirements above. For each, there is a key trade-off that you must be able to state — not just describe the mechanism.

Decision 1 — App and key model

The plat-06 lesson established that a developer platform key is not just an auth credential — it is the anchor for every other per-app concern: rate limit counters, webhook subscriptions, scope grants, and metering buckets all hang off the app's identity. The key embeds or resolves to three identifiers: account_id, app_id, and key_id. The gateway validates the key, extracts these three, and injects them as context for every downstream decision.

The 20-app cap is a product decision, not a technical limit. The platform can technically support any number of apps per account — the rate-limit key space and the metering aggregation are both keyed by app_id and scale horizontally. The cap exists so that accounts do not create "shadow apps" to route around per-app rate limits, and so that the daily-cap math remains meaningful (a per-account cap only makes sense if the number of apps is bounded). Enforcing the cap is a write-path check: CREATE APP counts existing apps and rejects at 20 with a 400 error.

Key rotation trade-off: the clean solution is to let old and new keys overlap for a configurable window (default: 1 hour). During overlap, both keys are valid; the gateway checks a small local cache of active key IDs. The trade-off is cache coherence: when a key is revoked, the platform must propagate the revocation within a bounded window (the SLA is 60 seconds, not eventual). This requires either a short cache TTL (60 s) or a pub/sub invalidation channel from the key store to all gateway replicas. Short TTL is simpler to operate; pub/sub invalidation is faster. Most platforms at this scale use both: a 60 s TTL as a safety net and a pub/sub channel for instant revocations.

Key structure

hbsp_live_v2_a1B2c3D4e5F6g7H8i9J0k_accountXXX_appYYY_keyZZZ
Prefix hbsp_live_v2 — environment + version (safe to log the prefix for debugging)
Opaque token segment — the secret portion (never logged, never stored in plaintext)
Suffix segments — account_id, app_id, key_id encoded in the key itself, so the gateway can extract context without a database round trip on every request

Decision 2 — Scopes and install-time authorization

The plat-07 lesson covered OAuth-style scopes for platform apps. The scope model here has one platform-specific wrinkle: scopes are granted at app install time and are immutable per key. If an integration needs an additional scope, it must re-install and get a new key with the expanded grant — it cannot add scopes to an existing key. This is a deliberate design choice from the account-holder's perspective: a key you granted last year should not be able to request new permissions without your explicit re-consent.

Trade-off: static vs. dynamic scope grants. Static grants (baked at install time) are auditable, predictable, and immune to confused-deputy attacks where a compromised key tries to escalate its own permissions. Dynamic grants (scope expansion on-demand) reduce friction for iterative integrations. Stripe uses static grants for restricted keys; HubSpot requires a new OAuth authorization flow for scope expansion. For a platform where security is the product (accounts trusting third-party integrations with their customer data), static grants are the right default.

Scope enforcement happens in the gateway, before the request reaches any service. The gateway reads the scope list from the resolved key context and compares it to a route-to-scope map: POST /v1/contacts → [crm.contacts:write]. A key that lacks the required scope receives a 403 with "code": "insufficient_scope" before any service is invoked.

Decision 3 — Nested rate limits and the daily cap

The plat-01 lesson laid out the nested-bucket model: every request must pass two independent rate-limit checks before it is allowed. Here the hierarchy has three levels:

Per-app burst limit — token bucket, enforced in real time. Default: 100 req/s. A runaway script in one app cannot flood the gateway. Checked first because it is the tightest and most frequently triggered.
Per-account burst limit — token bucket, enforced in real time. Default: 500 req/s aggregate across all apps. Prevents an account from spinning up 20 apps each doing 100 req/s simultaneously to achieve an effective 2,000 req/s.
Per-account daily cap — sliding or fixed window counter, daily granularity. Default: 1,000,000 calls/day. This is a billing-tier enforcement. When the cap is exhausted, all requests from all apps in the account receive 429 until UTC midnight. The counter is stored in Redis with a TTL aligned to UTC midnight.

The gateway checks level 1 first (cheapest — only one Redis key per app), then level 2 (one Redis key per account), then level 3 (daily counter). A request that passes all three is allowed; the response carries all three sets of rate-limit headers so a client SDK can observe which limit is closest to exhaustion.

Trade-off: where to enforce the daily cap. The daily cap counter is the most expensive to maintain at 8,000 req/s peak: every request increments it and reads its current value. Using a single Redis key per account for the daily cap creates a hot key problem if one account runs 8,000 req/s — all of those increments funnel to one slot. The solution is a local-increment / periodic-flush pattern: each gateway replica maintains a local counter shard per account and flushes to Redis every 100 ms. The in-Redis counter is approximately accurate for enforcement; the exact billing count comes from the metering store. At 100 ms flush intervals, the maximum error on the daily cap is at most (8,000 × 0.1) = 800 calls above the cap before a gateway replica catches up — less than 0.1% of the 1M daily cap. This is an acceptable enforcement window for a soft financial constraint.

Decision 4 — Webhook delivery subsystem

The plat-02 lesson described the full delivery pipeline. For this platform, the webhook system has two properties that interact: fan-out and scoped subscriptions. When an event is generated, the platform must determine which apps across which accounts have subscribed to that event type — then enqueue one delivery task per (app, endpoint) tuple. At 20 M deliveries/day, this fan-out step is the most CPU-intensive part of the webhook system.

Subscription index: the platform maintains a materialized index: event_type → [(account_id, app_id, endpoint_url, signing_secret)]. When an event fires, a single lookup against this index returns all delivery targets. The index is maintained in a fast read store (Redis or an in-process cache backed by a relational store). Invalidation happens on subscription create/update/delete via the same pub/sub channel used for key invalidation.

Delivery reliability: each delivery task is written to a durable queue (Kafka or a Postgres-backed job queue) before the fan-out process returns. The queue is the durability guarantee — even if the delivery worker crashes, the task survives. Workers consume from the queue, attempt the HTTPS delivery, and either acknowledge (success) or nack with a delay (retry). Retries use exponential backoff capped at 24 hours, after which the delivery is marked permanently failed and a webhook.delivery.failed event is written to the account's event log.

Trade-off: Kafka vs. Postgres job queue. Kafka gives higher throughput and native consumer-group parallelism but requires separate infrastructure and adds operational complexity. A Postgres-backed job queue (using SELECT FOR UPDATE SKIP LOCKED) is operationally simpler and sufficient up to ~5,000 jobs/s — which covers the 230 deliveries/s average load with significant headroom for spikes. At 20 M deliveries/day the Postgres queue is the right choice; at 200 M deliveries/day (10× growth) Kafka becomes necessary. See plat-02 for the delivery queue mechanics.

Decision 5 — Error model

The plat-03 lesson established the standard error envelope. Every error from this platform — regardless of which service generated it — is normalized to the same JSON shape at the gateway before it reaches the client:

{
  "error": {
    "code":    "rate_limit_exceeded",          // machine-readable; stable across versions
    "message": "App rate limit reached: 100 req/s. Retry after 12 s.",
    "status":  429,
    "type":    "rate_limit_error",               // error class; maps to documentation section
    "param":   null,                             // populated for validation errors: which field
    "request_id": "req_01J9W3KZR4TY8X2N6M5L7P"   // traceable in logs; always present
  }
}

Trade-off: normalized gateway vs. pass-through service errors. Normalizing at the gateway means every service can return its own internal error format — only the gateway translation layer needs to know the canonical envelope. The downside is that the gateway must map unfamiliar error shapes (e.g. a database timeout that surfaces as a Go context deadline exceeded) to the correct canonical code. This mapping is a small translation table, not a free-form transformation — each internal error code maps to exactly one canonical code and HTTP status. Unmapped errors become 500 / internal_error with the request_id for tracing.

Decision 6 — Tenant isolation

The plat-05 lesson described the isolation models for multi-tenant platforms. For this platform, tenant isolation operates at three layers simultaneously:

Data isolation: every row in the platform's data store carries an account_id column, and every query is filtered by it. No cross-account read is possible through the application layer. The only way to access another account's data is through the admin service, which has separate credentials and a mandatory audit log entry on every use.
Rate-limit isolation: the per-app and per-account buckets ensure one tenant's traffic spike does not consume shared capacity. The daily cap is the final backstop.
Webhook isolation: event fan-out uses the subscription index keyed by (event_type, account_id) — events generated by account A are never delivered to apps registered in account B, even if both subscribe to the same event type.

Trade-off: shared infrastructure vs. dedicated resources. All 50,000 accounts share the same Redis cluster, gateway fleet, and webhook worker pool. This is economically necessary and is standard practice (Stripe, HubSpot, and GitHub all run shared-infrastructure multi-tenancy at this scale). The risk is noisy-neighbor effects. Mitigation: per-app and per-account rate limits cap any single tenant's resource consumption; the Redis cluster uses consistent hashing so no single shard holds all keys for a popular account; webhook workers use per-account FIFO lanes so a backlogged account does not starve other accounts' delivery queues.

Decision 7 — Usage metering

The plat-08 lesson covered the metering pipeline in detail. The critical constraint for this platform is billing durability: if a call is made, it must be billed. This means the metering write must complete before the response is returned to the client — not asynchronously afterward. The mechanism is the hybrid counter described in Decision 3: an atomic Redis increment is the synchronous write (fast, durable within the Redis cluster's replication SLA); background flush jobs serialize snapshots to a billing-grade relational store every 60 seconds. The billing store is the invoice source; the Redis counter is the enforcement source. They diverge by at most one flush interval (60 s × throughput), which is acceptable for billing reconciliation but must be documented in the platform's billing terms.

Architecture

The platform's request path passes through seven discrete checkpoints before reaching any product service. Understanding each checkpoint is the difference between guessing where a failure comes from and knowing.

Fig 1 — Full platform architecture. Seven numbered gateway checkpoints sit on the hot path; the event bus, webhook workers, and metering flush are all off the critical path (async after response). The blueprint-grid background reflects how this diagram lives on engineer whiteboards — it is a system to be understood precisely, not approximated.

Request lifecycle through the gateway

For any individual request, the gateway executes a strict linear sequence. Understanding the sequence is how you debug: a 401 means auth failed; a 403 means auth passed but scope check failed; a 429 means auth and scope passed but a rate limit or daily cap rejected the request. No 429 is ever returned before the key is valid.

Fig 2 — Request lifecycle through the gateway. Each step can short-circuit the request with a specific HTTP status. A 429 is only possible after steps 2 and 3 succeed — an invalid or unauthorized key never gets to rate limiting.

3 — The API model

The platform's public surface consists of four resource families: apps, keys, webhook endpoints, and event subscriptions. All other resources (contacts, deals, billing records) belong to the product services — the platform surface is the configuration layer that wraps them.

App CRUD

# Create a private app (max 20 per account)
POST /v1/apps
Authorization: Bearer <account-level-key>
Content-Type: application/json

{
  "name":   "Nightly CRM Sync",
  "scopes": ["crm.contacts:read", "crm.deals:read"],
  "description": "Reads contacts and deals nightly for data warehouse sync"
}

─── 201 Created ───────────────────────────────────────────────────
{
  "id":          "app_01J9W3KZR4TY8X2N6M5L7P",
  "account_id":  "acct_9pXwL4mQ",
  "name":        "Nightly CRM Sync",
  "scopes":      ["crm.contacts:read", "crm.deals:read"],
  "status":      "active",
  "created_at":  "2025-11-15T09:00:00Z",
  "rate_limits": {
    "per_app_rps":   100,
    "per_acct_rps":  500,
    "daily_cap":    1000000
  }
}

# List apps (paginated)
GET /v1/apps?limit=20&cursor=app_01J9W3…

# 400 when the 20-app cap is hit
{
  "error": {
    "code":    "app_limit_exceeded",
    "message": "This account has reached the maximum of 20 private apps.",
    "status":  400,
    "type":    "validation_error",
    "param":   null
  }
}

Key management — create, rotate, revoke

# Create a new key for an app
POST /v1/apps/app_01J9W3KZR4TY8X2N6M5L7P/keys

─── 201 Created ───────────────────────────────────────────────────
{
  "id":           "key_3mZqR7tXv",
  "secret":       "hbsp_live_v2_a1B2c3D4e5F6g7H8i9J0k_acct9pXwL4_app01J9W3_key3mZqR7",
  "created_at":   "2025-11-15T09:01:00Z",
  "last_used_at": null
}
// secret shown ONCE at creation time; store it immediately

# Rotate: creates new key, old key remains valid for overlap_seconds (default 3600)
POST /v1/apps/app_01J9W3…/keys/key_3mZqR7tXv/rotate
{ "overlap_seconds": 3600 }

─── 201 Created ───────────────────────────────────────────────────
{
  "new_key": { "id": "key_9nYsQ4uWm", "secret": "hbsp_live_v2_…" },
  "old_key": {
    "id":         "key_3mZqR7tXv",
    "expires_at": "2025-11-15T10:01:00Z"   // overlap window end
  }
}

# Revoke immediately (propagated within 60 s to all gateway replicas)
DELETE /v1/apps/app_01J9W3…/keys/key_3mZqR7tXv

─── 200 OK ────────────────────────────────────────────────────────
{ "id": "key_3mZqR7tXv", "status": "revoked", "revoked_at": "2025-11-15T09:45:00Z" }

Webhook endpoint configuration

# Register a webhook endpoint and subscribe to event topics
POST /v1/apps/app_01J9W3…/webhooks
{
  "url":         "https://integrations.acme.com/hs-hooks",
  "event_types": ["contact.created", "contact.updated", "deal.stage_changed"],
  "description": "Sync CRM events to Acme data warehouse"
}

─── 201 Created ───────────────────────────────────────────────────
{
  "id":             "wh_7kLpM2nQr",
  "url":            "https://integrations.acme.com/hs-hooks",
  "event_types":    ["contact.created", "contact.updated", "deal.stage_changed"],
  "signing_secret": "whsec_K9mP3rT7vX2qL5nY1wZ8…",  // shown once; use for HMAC verification
  "status":         "active",
  "created_at":     "2025-11-15T09:05:00Z"
}

# Example delivery (signed)
POST https://integrations.acme.com/hs-hooks
X-HubSpot-Signature-v3: t=1731660305,v3=sha256=3b4f…
X-HubSpot-Request-Id: req_8xNtR5kZv
Content-Type: application/json

{
  "event_id":    "evt_2YhJp6mWq",
  "event_type":  "contact.created",
  "occurred_at": "2025-11-15T09:10:00Z",
  "account_id":  "acct_9pXwL4mQ",
  "app_id":      "app_01J9W3KZR4TY8X2N6M5L7P",
  "object": {
    "type": "contact",
    "id":   "ct_5pQnX9rYz",
    "properties": { "email": "alice@acme.com", "firstname": "Alice" }
  }
}

Rate-limit response headers on every request

HTTP/1.1 200 OK Content-Type: application/json X-RateLimit-App-Limit: 100 X-RateLimit-App-Remaining: 82 X-RateLimit-App-Reset: 1731660360 X-RateLimit-Account-Limit: 500 X-RateLimit-Account-Remaining: 437 X-RateLimit-Account-Reset: 1731660360 X-RateLimit-Daily-Limit: 1000000 X-RateLimit-Daily-Remaining: 14382 X-RateLimit-Daily-Reset: 1731715200 X-Request-Id: req_01J9W3KZR4TY8X2N6M5L7P

─── 429 rate limit — per-app burst ────────────────────────────────
HTTP/1.1 429 Too Many Requests
X-RateLimit-App-Limit: 100
X-RateLimit-App-Remaining: 0
X-RateLimit-App-Reset: 1731660362
Retry-After: 2

{
  "error": {
    "code":       "rate_limit_exceeded",
    "message":    "App rate limit: 100 req/s. Reset in 2 s.",
    "status":     429,
    "type":       "rate_limit_error",
    "limit_type": "per_app",          // which of the three limits triggered
    "param":      null,
    "request_id": "req_01J9W3KZR4TY8X2N6M5L7P"
  }
}

─── 429 daily cap exhausted ───────────────────────────────────────
{
  "error": {
    "code":       "daily_cap_exceeded",
    "message":    "Daily API cap of 1,000,000 calls reached. Resets at 2025-11-15T00:00:00Z (UTC midnight).",
    "status":     429,
    "type":       "rate_limit_error",
    "limit_type": "daily_cap",
    "param":      null,
    "request_id": "req_7pTmN8qXv"
  }
}

4 — Evaluation & by the numbers

Design decisions are only real when they survive contact with concrete numbers. This section traces the load through the system, identifies where the first bottleneck appears, and works out the math on the two most important limits: the 20-app cap and the daily cap.

The 20-app, 1M-call-per-day math: which limit binds first?

An account has 20 apps, each with the default per-app rate limit of 100 req/s. If all 20 run simultaneously at their per-app limit:

Per-app limit:         100 req/s per app
Apps at full throttle: 20 apps
Combined throughput:   20 × 100 = 2,000 req/s

But the per-account burst limit: 500 req/s
→ Per-account limit BINDS before per-app can be fully exploited across all 20 apps

At 500 req/s sustained, daily call volume:
  500 req/s × 3,600 s/hr × 24 hr = 43,200,000 calls/day
  43.2M >> 1M daily cap
→ The DAILY CAP binds long before the burst rate limit is a concern

Time to exhaust the 1M daily cap at sustained 500 req/s:
  t = 1,000,000 / 500 = 2,000 s = 33.3 minutes

Time to exhaust at the more typical 100 req/s (one busy app):
  t = 1,000,000 / 100 = 10,000 s = 2.78 hours
  → A single app running at its burst limit all day consumes the full daily cap in ~3 hours.
  → All other apps in the account are locked out for the remaining ~21 hours.

Implication: the daily cap is a HARD account-level governance tool, not just billing.
Apps should implement exponential backoff when Remaining → 0 at the daily layer.

⚠️ The day-boundary cliff

The daily cap resets at UTC midnight — not at midnight in the account's timezone, not at the time the account was created. This matters because accounts with batch jobs scheduled for "end of business day" in New York (UTC-5) may run their largest jobs at 20:00–23:00 UTC, consuming most of the daily cap in the hours just before the UTC midnight reset. A cap exhausted at 23:40 UTC leaves only 20 minutes until reset — but a cap exhausted at 00:10 UTC leaves nearly 24 hours. Document the UTC-midnight reset explicitly and surface the X-RateLimit-Daily-Reset header prominently in your SDK so clients can schedule around it.

Webhook fan-out volume and queue sizing

At 20 M webhook deliveries/day, understand the distribution between generation and delivery:

Metric	Value (modeled)	Derivation
Deliveries/day	20,000,000	Given requirement
Avg deliveries/s	231	20M / 86,400 s
Peak deliveries/s (4× avg)	925	Typical peak:avg ratio for B2B SaaS
Avg delivery attempt latency	~800 ms	Network RTT to customer endpoint + their processing
Worker concurrency needed at peak	740	Little's Law: L = λW = 925 × 0.8
Retry amplification (10% fail rate, 3 retries each)	+600,000 deliveries/day	20M × 0.10 × 3 retries = 6M, but most resolve on 1st retry: ~600K net extra
Queue depth at sustained peak	~55,000 tasks	925 tasks/s × 60 s backlog = 55,500

The bottleneck is not queue throughput — Postgres can handle this insert rate. The bottleneck is outbound connection concurrency: 740 simultaneous open HTTPS connections from the worker pool to customer endpoints. At this scale, workers need a connection pool with per-domain connection limits (to avoid overwhelming any single customer endpoint), and a circuit breaker per endpoint to avoid wasting worker slots on consistently failing destinations. See plat-02 for the circuit breaker pattern on webhook workers.

Metering volume and flush math

API calls/day:                  100,000,000
Peak calls/s:                   ~8,000 req/s (7× daily avg)
Metering writes/s (Redis INCR): 8,000  ← same as call volume (one per request)
Redis INCR throughput per node: ~100,000/s (single-threaded pipeline)
  → 1 Redis node handles peak metering load with 92% headroom
  → Shard by account_id across 4 Redis nodes for isolation; no single node hot

Flush interval to billing DB:   60 s
Max calls un-flushed at peak:   8,000 × 60 = 480,000 calls per flush batch
Billing DB rows written/day:    50,000 accounts × (24 × 60 / 1) = 1 row/flush/account
  = 50,000 × 1,440 flushes/day = 72,000,000 billing DB inserts/day
  → Use upsert: UPDATE counter WHERE date=today AND account_id=X, not 72M inserts
  → Actual write: UPDATE metering_daily SET calls = calls + <batch> WHERE ...
  → One row per (account_id, app_id, date) updated every 60 s: manageable

Platform-scale system limits trace

Layer	Throughput at peak (modeled)	Bottleneck risk	Mitigation
TLS termination + routing	8,000 req/s	CPU (TLS handshakes; keep-alive reduces this to ~5% new connections)	Session resumption (TLS 1.3 0-RTT); HTTP/2 multiplexing per SDK
Key auth (Redis cache lookup)	8,000 lookups/s per key that's not in local LRU	Cache miss storm on cold deploys	Local LRU with 60 s TTL absorbs 95%+ of lookups; Redis handles <400 miss/s
Rate-limit check (Redis)	16,000 Redis ops/s (2 buckets × 8,000 req/s)	Redis latency variance	Lua script for atomic two-bucket check in a single round trip; p99 < 0.5 ms
Daily cap check	Local counter + 60 s flush	Flush contention at midnight (all counters reset simultaneously)	Stagger flush by `shard(account_id) % 60` seconds — spreads reset load
Metering INCR	8,000 Redis INCR/s	Hot keys for high-volume accounts	4-shard Redis; per-account key striping
Webhook fan-out	925 deliveries/s peak	Outbound connection exhaustion to customer endpoints	Per-endpoint connection pool; circuit breaker; per-account FIFO lane
Billing DB flush	~833 upserts/s (50,000 accounts × 1/60 s)	Lock contention on hot accounts	UPSERT with conflict target; no row-level locks; partitioned by date

How real platforms do it

The design above is not hypothetical — it is a synthesis of documented practices from three platforms that have operated at comparable or larger scale for years.

Concern	HubSpot	Stripe	GitHub
App model	Private apps are first-class objects with isolated keys and scopes. The "daily API limit" (currently 500,000 requests/day for standard plans) is an account-level hard cap, not per-app. See HubSpot Private Apps documentation.	Restricted keys carry per-resource permission scopes (read, write, per-object type). A restricted key that lacks write permission to `charges` receives 403 on any mutation attempt — scope enforcement is at the gateway, not the service. See Stripe — Restricted API Keys.	GitHub Apps have their own identity separate from the installing account's OAuth token. Each installation generates an installation access token with the scopes the App requested and the owner approved at install time — scopes are immutable per installation, requiring re-installation for changes. See GitHub Apps documentation.
Nested rate limits	HubSpot enforces per-app burst limits (typically 100–150 req/10 s depending on tier) alongside the per-account daily cap. The daily cap is a billing-tier feature: different plans get different caps. The `X-HubSpot-RateLimit-Daily-Remaining` header is returned on every response. See HubSpot API Usage & Limits.	Stripe uses token-bucket limits (approximately 100 read req/s, 100 write req/s in live mode) enforced per Stripe account, not per restricted key. The distinction matters: all keys on the same account share one bucket — there is no per-key isolation. This is a simpler model but means one misbehaving integration can starve all others on the same account. See Stripe Rate Limits.	GitHub enforces a primary rate limit of 5,000 authenticated requests/hour per installation, with secondary limits on concurrent requests and specific resource mutations. The `x-ratelimit-used`, `x-ratelimit-remaining`, and `x-ratelimit-reset` headers appear on every response. GitHub explicitly documents that secondary limits are unpublished to prevent gaming — the "feel the limit" approach as distinct from hard-coded per-app caps. See GitHub REST API rate limits.
Webhook signatures	HubSpot v3 webhook signatures use HMAC-SHA256 over `client_secret + http_method + url + request_body + request_timestamp`. The timestamp is included in the signed string specifically to prevent replay attacks — a valid signature on a 10-minute-old request is still rejected. See HubSpot — Validating Requests.	Stripe signatures cover `timestamp + "." + raw_body` via HMAC-SHA256 using the endpoint's signing secret. Stripe recommends a 300 s tolerance window and ships helper methods in every official SDK that handle the constant-time comparison. See Stripe — Webhook Signatures.	GitHub uses a secret token set per webhook endpoint, and signs the payload with HMAC-SHA256 into the `X-Hub-Signature-256` header. No timestamp is included in GitHub's signed payload — replay protection is left to the recipient. See GitHub — Validating Webhook Deliveries.

✅ The one pattern that separates mature platforms from immature ones

Mature platforms (Stripe, HubSpot, GitHub) all return rate-limit state on every response — not just on 429. A well-built SDK reads these headers on every 2xx response and proactively slows its request rate as the remaining budget drops toward zero. The implication for platform designers: the 429 is a failure mode, not the primary rate-limit communication channel. If your clients are hitting 429 frequently, your headers or your SDK are failing them — the limit is not the problem, the feedback loop is. Design the SDK first, then the headers, then the HTTP status codes.

🧠 Quick check

An account has 20 apps each running at their 100 req/s per-app burst limit. The account's per-account burst limit is 500 req/s and the daily cap is 1,000,000 calls. At what sustained throughput will the daily cap be exhausted in under 34 minutes?

At 500 req/s (the account burst ceiling), the 1M daily cap is exhausted in exactly 1,000,000 / 500 = 2,000 seconds ≈ 33.3 minutes. The 20 × 100 = 2,000 req/s potential is blocked by the per-account limit. The daily cap is the binding constraint, not the burst limit, for sustained load.

A gateway receives a request with a valid API key but the key belongs to an app that has only crm.contacts:read scope, and the request is DELETE /v1/contacts/ct_abc. What HTTP status does the gateway return, and at which step in the lifecycle?

A valid key that lacks the required scope receives 403, not 401. The distinction matters for clients: 401 means "re-authenticate," 403 means "you cannot do this with any valid credential from this app — you need a different scope." Rate limiting (step 4–6) only runs after scope is confirmed at step 3; the request never reaches rate limiting.

A webhook delivery worker sends a signed payload to a customer endpoint. The endpoint returns HTTP 200 but also sends back a JSON body with {"status": "processing"}. What does the platform's delivery worker do?

The platform's delivery contract is defined by HTTP status codes, not response bodies. Any 2xx response is a success; the endpoint acknowledged receipt. The pattern the platform enforces is: return 200 immediately, process asynchronously. The response body is irrelevant to the delivery outcome — the worker should never parse it for delivery decisions.

At 8,000 req/s peak, the platform needs to perform two atomic rate-limit checks per request (app bucket + account bucket) against Redis. What is the recommended implementation to stay within the <1 ms budget?

A Lua script executes atomically on the Redis server in a single round trip. Both bucket checks (read current tokens, decrement if available, return allowed/denied for each) happen server-side. This is exactly one network RTT (~0.3 ms) regardless of how many keys the script touches. Two sequential commands would be 2 RTTs (0.6 ms), and a non-atomic MGET/MSET can return incorrect decisions under concurrent requests from other gateway instances.

An account's daily cap counter is stored in Redis and flushed to the billing database every 60 seconds. At 8,000 req/s peak, what is the maximum number of calls that could be charged to an account before the billing database reflects them?

With a 60-second flush interval and an 8,000 req/s peak, up to 8,000 × 60 = 480,000 calls can be recorded in Redis but not yet in the billing database. This is the reconciliation window. For rate-limit enforcement, Redis is the source of truth; for billing invoices, the database is the source of truth. These can diverge by up to one flush interval (480K calls at peak), which is documented as the billing lag in the platform's terms.

✏️ Extend the platform: add per-app daily sub-caps

The current design has a per-account daily cap of 1,000,000 calls shared across all apps. Your product team wants to add the ability for accounts to set a per-app daily sub-cap — so that a single noisy app cannot consume all of the account-level budget.

Your task: design the changes required to support per-app daily sub-caps. Cover all of the following:

How you store the per-app cap configuration (where, what schema change, what API surface).
How the gateway checks both the per-app daily sub-cap AND the per-account daily cap in the correct order, and what headers you add.
What happens when an app hits its sub-cap but the account-level cap has not been reached — what does the 429 response look like, and which other apps are still allowed to make calls?
The Redis key schema for two independent daily counters — one per app, one per account — and the Lua script (pseudo-code) that checks both atomically.
One trade-off of per-app sub-caps that you would want to surface to the product team before shipping.

Model answer:

1. Storage: Add an optional daily_cap field to the app resource. Stored in the app config table (ALTER TABLE apps ADD COLUMN daily_cap_override INTEGER). If null, the app has no sub-cap — only the account-level cap applies. API surface: PATCH /v1/apps/:id { "daily_cap": 100000 } returns the updated app object. The cap configuration is cached in the same key-auth cache entry — the gateway resolves the per-app cap during key validation, not as a separate lookup.

2. Gateway order: Per-app daily sub-cap check comes before the per-account daily cap check (since per-app is tighter). The header set expands to include X-RateLimit-App-Daily-Limit, X-RateLimit-App-Daily-Remaining, and X-RateLimit-App-Daily-Reset alongside the existing account-level daily headers.

3. Sub-cap hit, account cap not hit: The 429 response carries "limit_type": "per_app_daily" and the reset time for the app's counter. Other apps in the account are unaffected — their calls continue until they hit their own sub-caps or the account daily cap. The error message should say "This app's daily limit of X calls has been reached. Other apps on your account are still active."

4. Redis schema: rl:daily:app:{account_id}:{app_id}:{date} for per-app; rl:daily:acct:{account_id}:{date} for per-account. The Lua script checks the per-app key first (INCR, compare against app cap), then the per-account key (INCR, compare against account cap). If either check fails, decrement the key that was already incremented before returning denied — otherwise a failed request would still consume daily quota.

5. Trade-off: Correct decrement-on-denial requires the Lua script to conditionally decrement on failure, making the script more complex and harder to test. The simpler "increment first, check after" approach overcounts usage for denied requests. At high denial rates (common when a sub-cap is set too low), overcounting could lock an app out of its sub-cap hours before it should be. The product team should decide: is it acceptable for an app to exhaust its sub-cap faster than actual successful calls consumed it?

Rubric: Full marks require all five points. Half marks if the Redis key schema is present but the atomic decrement-on-denial behavior is missing. No marks if the design allows a sub-cap hit to block all other apps (violates the isolation principle).

Key takeaways

A developer platform is a layered system of trust: keys establish identity, scopes constrain what that identity can do, rate limits constrain how fast, daily caps constrain how much, and metering records it all for billing. Each layer is independent — a misconfigured scope check is not a rate-limit problem.
At scale, the daily cap is almost always the binding constraint for sustained load, not the burst rate limit. A single app at its burst rate can exhaust the account's daily budget in hours. Educate your customers with prominent headers; don't wait for the 429.
The metering write must be synchronous (before the response) to be billing-exact. The hybrid pattern — atomic Redis INCR sync, flush to durable store async — is the standard solution: it satisfies both the latency requirement (<1 ms) and the durability requirement (billing-grade counts).
Webhook delivery bottleneck at scale is outbound connection concurrency, not queue throughput. A per-endpoint connection limit and circuit breaker are not optimizations — they are correctness features that prevent a single slow customer endpoint from blocking your entire worker pool.
The rate-limit error response should encode which of the three limits triggered (per_app, per_account, or daily_cap). A generic "rate limited" error forces clients to guess which counter to back off against. The distinction between a 2-second burst retry and a 23-hour daily-cap wait is the difference between a good client SDK and a broken one.
Scope enforcement belongs in the gateway, not the service. If a service checks scopes itself, the system's security model depends on every service being correct simultaneously. Gateway enforcement means a scope bypass requires compromising the gateway — a much harder attack than finding a single service that forgot to check.