API Design

Platform & API Product Engineering · Lesson 06

Developer Platform: Apps, Keys, Scopes & Rotation

One key per user is an anti-pattern. Real developer platforms issue credentials to applications, not accounts — enabling blast-radius isolation, independent rotation, and precise attribution. This lesson covers every layer of that model, from the bit layout of a key to the database schema behind zero-downtime rotation.

⏱ 18 min advanced Prereq: API keys (sec-08), OAuth (sec-06)

By the end you'll be able to

The platform model: apps own credentials, accounts own apps

Most internal APIs start with a simple model: one API key per team, per user, or per environment. That works until it doesn't. When a key leaks, you have to rotate everything. When a key is misused, you can't tell which integration did it. When a contractor leaves, you can't revoke only their access.

Developer platforms — Stripe, HubSpot, GitHub, AWS — solved this by introducing an intermediate entity: the App. An App represents a specific integration: a webhook listener, a billing sync, a data pipeline. Each App gets its own credentials, its own scopes, its own rate-limit tier, and its own webhook endpoints. A single account can have many apps.

Account acme-corp 1..20 App billing-sync App webhook-relay 1..5 Key A sk_live_3xKj… Key B sk_live_9mRp… read:invoices write:invoices read:customers granted scopes Key C sk_live_7nQz… ACCOUNT APPS KEYS SCOPES
One account, multiple apps, multiple keys per app, scopes per key. Each app has independent credentials — compromise of one app does not expose others.

This hierarchy gives you three things "one key per account" cannot:

🎯 Interview question

"Design an API key management system for a developer platform — what are the key components?" A strong answer covers: the App entity as a first-class credential container, key format with routing prefix and entropy, hash-at-rest with show-once issuance, scope model, and dual-key rotation for zero-downtime rollover. That's the full surface. This lesson walks through each component in detail.

Key format: what's inside sk_live_3xKj9mRpQ…

An API key is not just a random string. A well-designed key encodes enough structure to be useful without leaking secrets.

PartExamplePurpose
Environment prefix sk_live_ / sk_test_ Separates live from test key spaces. A test key sent to a live endpoint fails immediately, not silently. Also the first thing a secret scanner looks for.
Key type indicator sk_ (secret), pk_ (public), rk_ (restricted) Routes the key to the correct validation path. pk_ keys are safe to embed in client-side code; sk_ keys are not. The prefix makes this visually obvious.
Random body 3xKj9mRpQf… (32–44 chars) 32 bytes of CSPRNG output, base58 or base64url encoded. Provides ≥128 bits of entropy — brute force is computationally infeasible for any plausible adversary.
Display hint (stored, not in key) sk_live_3xKj The platform stores the first 12 characters in plaintext as a display hint. This lets developers identify which key is which in the dashboard without storing the full secret.

The prefix-first design also enables a critical defensive property: secret scanning. GitHub, GitLab, and dozens of third-party tools scan public repositories for strings matching known key prefixes. If a developer accidentally commits sk_live_3xKj9mRp…, the scanner matches sk_live_ and alerts within minutes. Without a recognizable prefix, the key is invisible to scanners — it looks like any other long random string.

Entropy: why 128 bits is the floor

A 32-byte CSPRNG output gives 256 bits of entropy. Encoding that as base58 produces ~44 characters; base64url gives ~43 characters. Either is fine. The question is why 128 bits is treated as the minimum acceptable floor.

At 128 bits, the key space contains 2128 ≈ 3.4 × 1038 possible keys. Assuming an adversary can check 109 guesses per second — an extremely generous assumption given that each guess requires a database lookup — brute-forcing would take approximately 1022 years. For reference, the age of the universe is ~1.4 × 1010 years. This is a modeled bound, not a proof of security, but it establishes that the attack surface for brute force is not the key's randomness; it's everything else (social engineering, key exposure via logs, insecure storage).

⚠️ Pitfall: storing the full API key in plaintext

API keys are secrets. Storing them in your database in plaintext is the same category of mistake as storing passwords in plaintext. If your keys table is exfiltrated — via SQL injection, a leaked database backup, or a misconfigured export — every single key is immediately usable. The fix is the same as for passwords: store only the hash, never the plaintext. But note the choice of hash function matters, and it's different from the password case — see the next section.

Hashing at rest: SHA-256, not bcrypt

Both passwords and API keys should be hashed before storage. But the right hash function is different for each, and the reasoning matters.

Passwords have low entropy — humans choose them, so they cluster around common patterns. bcrypt, scrypt, and Argon2 are designed to be slow, making brute-force attacks against a stolen hash database expensive. The slowness is the feature.

API keys have high entropy by construction — 32 bytes of CSPRNG output. An attacker who steals a hash database cannot brute-force a 256-bit key space. bcrypt's slowness is therefore irrelevant as a defense, but it is very relevant as a cost: every API request would require a bcrypt lookup, adding 50–200ms of latency per call. SHA-256 takes microseconds. The protection comes from entropy, not hash slowness.

Numbers

bcrypt at cost=12: ~300ms per lookup. SHA-256: ~1µs per lookup. An API serving 10,000 req/s would need 3,000 CPU cores just for bcrypt auth lookups. SHA-256 at the same throughput needs ~10ms of CPU total. The tradeoff is clear: use bcrypt for passwords (low entropy, low traffic), SHA-256 for API keys (high entropy, high traffic).

In practice, many platforms use HMAC-SHA256(key, server_secret) rather than bare SHA-256. The server secret (stored separately from the database, e.g., in a secrets manager) means that even if the hash database is exfiltrated, an attacker still needs the server secret to compute valid HMAC inputs. This adds a second factor without meaningful performance cost.

✅ Store the display hint separately

Always store the first N characters of the key — e.g., sk_live_3xKj — as a separate plaintext column. This is not a security leak: the prefix is structurally public (it's a routing hint, not secret), and 12 characters gives developers enough to match "which key is this?" in the dashboard. Without this, all keys look identical after creation — developers have no way to identify which key belongs to which integration. The hash is never displayed; the hint is always displayed.

The show-once rule: minting a key

The full key is shown to the developer exactly once: at creation. From that moment, the platform only has the hash. It cannot show the key again even if asked. This is not a UX decision — it's a security requirement. If the platform could retrieve the full key on demand, then an XSS attack on the developer dashboard, a compromised admin session, or a database query by a rogue employee could expose every key to every customer.

Generate CSPRNG 32 bytes Assemble prefix + body sk_live_3xKj9… SHA-256 hash(full_key) e3b0c44298fc… DB stores hash + hint never full key Show once to developer — full key full key discarded after display entropy source key assembly one-way hash persisted
The full key travels two paths after hashing: to the database (hash only), and to the developer (full key, one time). After that, only the hash exists on the server.

Complete pseudocode: minting and verifying

Here is the complete flow — issuance and verification — precise enough to implement directly:

# Key creation — called when a developer clicks "Create new API key"
function create_api_key(app_id, scopes):
    raw_random   = crypto.randomBytes(32)           # 256 bits of CSPRNG
    prefix       = "sk_live_"
    key_body     = base58encode(raw_random)           # ~44 chars, URL-safe
    full_key     = prefix + key_body                  # e.g. sk_live_3xKj9mRpQf…

    key_hash     = SHA256(full_key)                   # this is stored
    display_hint = full_key[:12]                     # e.g. "sk_live_3xKj" — safe to show

    DB.insert({
        app_id:       app_id,
        key_hash:     key_hash,        # ONLY this — never full_key
        display_hint: display_hint,    # first 12 chars for dashboard identification
        scopes:       scopes,
        created_at:   now(),
        expires_at:   now() + 90_days, # optional TTL; null = never expires
        revoked:      false
    })

    return full_key   # shown ONCE to developer; server discards it immediately after
# Request verification — called on every inbound API request
function verify_request(authorization_header):
    raw_key  = strip_bearer(authorization_header)     # e.g. "sk_live_3xKj9…"

    # Basic prefix sanity-check (fast rejection before DB lookup)
    if not raw_key.startsWith("sk_live_") and not raw_key.startsWith("sk_test_"):
        return 401

    key_hash = SHA256(raw_key)                        # recompute hash
    record   = DB.lookup_by_hash(key_hash)            # single indexed lookup

    if not record:
        return 401   # key doesn't exist
    if record.revoked:
        return 401   # key has been revoked — don't hint which
    if record.expires_at and record.expires_at < now():
        return 401   # key expired

    # Attach identity context for downstream scope checks
    return {
        app_id: record.app_id,
        scopes: record.scopes,
        env:    parse_env(raw_key)  # "live" or "test" from prefix
    }
$ curl -H "Authorization: Bearer sk_live_BADKEY123" https://api.example.com/v1/invoices HTTP/1.1 401 Unauthorized {"error": "invalid_credentials", "message": "API key is invalid or has been revoked."} $ curl -H "Authorization: Bearer sk_live_3xKj9mRpQf..." https://api.example.com/v1/invoices HTTP/1.1 200 OK {"invoices": [...], "has_more": false}

Scoped keys: least privilege per integration

Every key carries a set of permission scopes. The scope set is stored with the key record in the database and returned on every successful key lookup. Authorization is checked against that set, never against account-level permissions.

Scope design conventions that appear across real platforms:

How the middleware checks scopes on every request

  1. Authentication middleware extracts the key from the Authorization header, hashes it, looks up the record, returns { app_id, scopes } or 401.
  2. Route handler declares the required scope(s) — e.g., required_scope = "write:invoices".
  3. Authorization middleware checks whether required_scope ∈ context.scopes. If not: 403 Forbidden.
  4. Business logic proceeds, with context.app_id available for attribution and audit logging.

Worked example: scope mismatch

A developer creates an app for reading order data. They scope the key to ["read:orders", "read:customers"]. Later, their pipeline code accidentally calls DELETE /v1/invoices/inv_9x.

→ Auth middleware: key lookup succeeds, scopes = ["read:orders", "read:customers"] → Route: DELETE /v1/invoices/:id requires scope "admin:invoices" → Authorization: "admin:invoices" NOT IN ["read:orders", "read:customers"] HTTP/1.1 403 Forbidden {"error": "insufficient_scope", "message": "This key does not have the admin:invoices scope.", "required_scope": "admin:invoices", "doc_url": "https://docs.example.com/api/scopes"} → Auth event logged: app_id=billing-sync, key_hint=sk_live_3xKj, scope_denied=admin:invoices

The 403 response reveals the required scope explicitly. This is intentional — the caller is authenticated (the key is valid), so there's no oracle risk in telling them which permission is missing. The alternative — a vague "access denied" — makes debugging unnecessarily painful.

Compare to the 401 case: when the key itself is invalid or expired, the response is deliberately uninformative. Telling an anonymous caller "your key is revoked" versus "your key doesn't exist" would reveal whether a guessed key is a valid (revoked) credential — a subtle oracle. Both cases return the same message.

Key lifecycle: issuance to decommission

A key's lifecycle has five distinct phases. Each requires a deliberate decision — get any of them wrong and you either create downtime (bad rotation design), security exposure (no expiry or revocation), or operational friction (no leak detection).

a. Issuance

Developer creates an app → selects scopes → clicks "Create key." Platform generates the key (CSPRNG → prefix + body), stores only the hash and display hint, and shows the full key exactly once in a modal with a "Copy" button. The modal carries a warning: "This key will not be shown again. Store it securely." The developer closes the modal. The key is gone from the server's memory.

b. Rotation without downtime — the dual-key overlap window

This is the most operationally important design decision in key lifecycle management. Naive key rotation — revoke old, create new, update config — has a mandatory downtime window between step 1 and step 3. The correct approach is an overlap window:

  1. Developer creates a new key (Key B). Both Key A and Key B are now valid. The platform supports multiple active keys per app simultaneously.
  2. Developer updates their application's configuration to use Key B. This is deployed to production — could be immediate, could take 24–72 hours across multiple regions and services.
  3. Developer verifies that Key B is receiving traffic (check the platform's per-key request logs).
  4. Developer retires Key A via the dashboard. It is immediately revoked — all subsequent requests using Key A receive 401.

The overlap window length should match your deployment cycle. If your services can be updated in under an hour, a 24-hour overlap gives 24× headroom. Most platforms support 2–5 simultaneous valid keys per app — enough for rotation without becoming an unmanaged proliferation risk.

Key rotation timeline. The overlap window is the safe period during which both keys are valid — deployments complete, traffic migrates, then the old key is retired. No downtime required.

c. Revocation

Revocation is immediate and unconditional. The platform sets revoked = true on the key record. The next request using that key gets 401 — there is no grace period. This is why the overlap window happens before revocation, not after. You never revoke first.

Revocation is surfaced in the audit log with: timestamp, who triggered it (user/system/leak-detection), and the key display hint. The full key is never logged — only the hint and hash.

d. Expiry (TTL keys)

Keys can carry an optional expires_at timestamp. When set, the verification path checks expires_at < now() before allowing the request. Expired keys return 401 with a message that distinguishes expiry from revocation — developers need to know whether to rotate or investigate.

Compliance use cases typically mandate 90-day expiry. Developer tools (CI tokens, local dev keys) can use shorter TTLs (24 hours or 7 days) to bound exposure. Long-lived production service keys can be issued without expiry, relying on explicit rotation policies instead.

e. Leak detection

GitHub's secret scanning API notifies registered platforms when a commit or public gist contains a string matching the platform's key pattern. This requires registering your key prefix pattern with GitHub's secret scanning partner program. When a match is found:

  1. GitHub sends a webhook to the platform's leak-detection endpoint with the raw matched key.
  2. Platform hashes the leaked key, looks it up in the database.
  3. If found and not already revoked: auto-revoke, send email/notification to the key owner, log the incident.
  4. Platform sends a confirmation response to GitHub within 5 seconds — required by the protocol.

The response to the developer should be specific: "Your key sk_live_3xKj… was found in a public repository at [URL] and has been automatically revoked. Create a new key and ensure it is not committed to version control."

Per-app configuration beyond keys

The App entity is the configuration boundary for everything about an integration, not just keys:

ConfigWhat lives hereNotes
Webhook endpoints URL + signing secret per endpoint Each app has its own HMAC signing secret for webhook payloads. Compromise of one app's signing secret doesn't expose other apps' webhooks.
Granted scopes Set of scopes the app is permitted to hold Even if a developer tries to create a key with admin:billing, the app must first have that scope granted. A two-tier model: app-level scope grant + key-level scope assignment.
Rate-limit tier Requests per second / per day per key Enforced per-key, not per-account. A single account with 10 apps gets 10× the rate allowance of a single-app account. See the rate-limiting lesson for implementation.
Environment Test vs Live key namespaces Test keys (sk_test_…) hit a sandboxed environment with mock data and no billing. Live keys (sk_live_…) hit production. The environments are completely separated — a test key cannot call a live endpoint.
IP allowlist Optional CIDR ranges that may use this app's keys A server-side integration running from known IPs can add this as a defense-in-depth measure. Requests from outside the allowlist get 403 regardless of key validity.

API key vs OAuth token: choosing the right tool

These are not competing credentials — they solve different problems. The mistake is using one where the other is clearly right.

DimensionAPI KeyOAuth Token
Represents An application. No user context. A user's delegation to an application. Carries user identity.
Complexity Low — generate, store, send in header. No protocol overhead. High — authorization code flow, token endpoint, refresh tokens, PKCE.
User delegation Not possible. The key acts as the app identity only. Yes — the token encodes which user authorized which scopes for which app.
Revocation Platform-controlled. Immediate via hash lookup. User-controlled AND platform-controlled. User can revoke access from their settings.
Lifespan Long-lived by default. Explicit rotation required. Short-lived (access token, 15min–1h). Refresh token extends session without re-auth.
When to use Server-to-server automation, CI/CD, data pipelines, integrations where no user is in the loop. Apps that act on behalf of a user — read their data, post on their behalf, access their account settings.

The principle is simple: if a human user's permission is required, use OAuth. If the integration is purely machine-to-machine, use an API key. Stripe's payment processing API uses keys — no user is "delegating" access; you're calling Stripe on behalf of your platform. GitHub's user API uses OAuth — a third-party app needs a specific user's permission to read their repositories.

Trade-off tables

Long-lived vs short-lived keys

DimensionLong-lived (no expiry)Short-lived (TTL)
Security exposureHigher — a leaked key is valid indefinitely until manually revokedLower — leaked key self-expires; useful alongside leak detection
Operational burdenLower for developers — no renewal cycle to manageHigher — developers must implement key renewal before expiry
ComplianceMay not meet policies requiring periodic rotation (e.g., 90-day rotation mandates)Satisfies rotation requirements by design — expiry is rotation

One account-wide key vs per-app keys

DimensionOne shared keyPer-app keys
Blast radiusFull — all integrations exposedIsolated — only the compromised app's scopes are exposed
AttributionNone — you can't tell which integration made a requestFull — every request tagged to a specific app
Rotation granularityAll-or-nothing — rotating affects every integration simultaneouslyPer-integration — rotate one app's key without touching others
Setup complexityTrivial — one key to manageModerate — developers must create and manage apps; platform must implement the app model

Prefix routing: benefits vs complexity

DimensionNo prefixStructured prefix (sk_live_, sk_test_)
Secret scanningScanner can't recognize the key — leaks go undetectedScanners match prefix pattern — leaks detected in minutes
Environment separationTest key can accidentally reach live environment (silent)Key fails immediately on wrong environment — fast feedback
Key type routingPlatform must infer type from DB lookup — extra latencyType known immediately from prefix — can short-circuit early
Format maintenanceNonePrefix must be registered with secret scanning services; format must be documented for developers

How real platforms do it

Every major developer platform has converged on the app model, each with slight variations that reflect their specific threat model and developer experience choices.

PlatformKey modelKey format / scopesDocs
Stripe Secret keys (sk_), publishable keys (pk_), restricted keys (rk_). Restricted keys carry explicit scope grants — e.g., read-only access to charges. sk_live_… / sk_test_…. Restricted keys show scope checkboxes at creation time. docs.stripe.com/keys
GitHub Fine-grained Personal Access Tokens (PATs): repo-scoped, expiry required (max 1 year), minimal permissions model. Replaced classic PATs which had broad, unscoped access. github_pat_… prefix for fine-grained tokens. Permissions per resource type (contents, pull_requests, etc.). GitHub PAT docs
HubSpot Private Apps replaced legacy API keys in 2022. Each Private App generates an access token scoped to CRM objects. The old hapikey= query param was deprecated — a key in a URL, a pattern this lesson covers in the pitfall section. Access tokens scoped per CRM object type and action (read/write/delete). HubSpot Private Apps docs
AWS IAM Access Key ID + Secret Access Key pair. Keys are tied to IAM users or roles. STS generates short-lived session tokens for cross-account and federated access. Access Key ID: AKIA… (20 chars, non-secret). Secret Access Key: 40 chars, secret. STS tokens: ASIA… prefix. AWS IAM access key docs

Note the convergence: all four platforms have moved toward shorter-lived, narrower-scoped credentials and away from long-lived, broad-access keys. GitHub deprecated classic PATs. HubSpot deprecated the hapikey pattern. AWS pushes IAM roles with STS over long-lived IAM user keys. The direction of travel is clear.

For complementary coverage of key security from the threat and hardening angle — JWT pitfalls, the bcrypt vs SHA-256 tradeoff in depth, and the OWASP hardening checklist — see Lesson sec-08: API Keys, JWTs & a Hardening Checklist.

⚠️ The HubSpot anti-pattern: keys in URLs

HubSpot's legacy API accepted ?hapikey=your_api_key as a query parameter. This means the key appeared in server access logs, browser history, CDN logs, proxy logs, and any Referer header sent to third-party resources on the page. It was a known bad practice and they fixed it — but only after years of wide adoption. Always require credentials in the Authorization header, never in the URL. The lesson is that "it works" is not the same as "it is safe."

By the numbers

Entropy and brute force (modeled)

Quiz

Check your understanding

1. Why is only the hash of an API key stored, not the plaintext?

The core reason is that a stolen hash database is useless to an attacker — they cannot reverse SHA-256 to obtain the original key. The show-once behavior (option B) is a consequence of this design, not the primary motivation. Hashes are not shorter than modern keys (SHA-256 is 32 bytes, similar to a 32-byte key body). OAuth does not specify API key storage.

2. What is the purpose of the dual-key overlap window during rotation?

The overlap window solves a pure operational problem: there is always a non-zero gap between "new key created" and "all services updated to use it." If the old key is revoked before that migration completes, requests fail. Both keys being valid simultaneously means deployments can roll out gradually — when the last instance stops using the old key, it can be retired without any downtime risk.

3. A developer has a key with scopes ["read:orders"]. They call POST /v1/invoices, which requires write:invoices. What response should they receive?

401 means the caller is not authenticated — but this caller is authenticated (valid key). 403 means the caller is authenticated but not authorized for this specific operation. That's the correct status here. Returning 404 to hide the endpoint is a debated pattern (security through obscurity) but is not the standard behavior for scope failures. 200 is obviously wrong — scope checks exist precisely to prevent this.

4. What is the primary advantage of per-app keys over a single account-wide key?

The primary advantage is blast-radius isolation combined with attribution and independent rotation. If billing-sync's key leaks, only billing scopes are exposed — the data-export app's credentials are unaffected, and you can rotate one without touching the other. The other options are incorrect: per-app keys add complexity for developers, DB performance is unrelated to this partitioning, and OAuth 2.0 doesn't mandate this model.

✍️ Exercise: design the key management schema

Design the database schema for a developer platform's key management system. Your design should include:

Model answer:

-- Accounts: the billing/ownership entity
CREATE TABLE accounts (
    id          UUID          PRIMARY KEY DEFAULT gen_random_uuid(),
    name        VARCHAR(255)  NOT NULL,
    created_at  TIMESTAMPTZ   DEFAULT now()
);

-- Apps: one per integration, owned by an account
CREATE TABLE apps (
    id              UUID          PRIMARY KEY DEFAULT gen_random_uuid(),
    account_id      UUID          REFERENCES accounts(id) ON DELETE CASCADE,
    name            VARCHAR(100)  NOT NULL,
    granted_scopes  TEXT[]        NOT NULL DEFAULT '{}',
    environment     VARCHAR(10)   NOT NULL DEFAULT 'test',  -- 'live' | 'test'
    created_at      TIMESTAMPTZ   DEFAULT now()
);

-- API keys: multiple per app, only hash stored
CREATE TABLE api_keys (
    id           UUID          PRIMARY KEY DEFAULT gen_random_uuid(),
    app_id       UUID          REFERENCES apps(id) ON DELETE CASCADE,
    key_hash     CHAR(64)       NOT NULL UNIQUE,   -- SHA-256 hex; the only secret-derived value stored
    display_hint VARCHAR(16)   NOT NULL,          -- e.g. "sk_live_3xKj" — first 12 chars
    scopes       TEXT[]        NOT NULL,          -- subset of app.granted_scopes
    revoked      BOOLEAN       NOT NULL DEFAULT FALSE,
    revoked_at   TIMESTAMPTZ,
    revoked_by   VARCHAR(50),                     -- 'user', 'leak_detection', 'expiry_job'
    expires_at   TIMESTAMPTZ,                      -- NULL = no expiry
    created_at   TIMESTAMPTZ   DEFAULT now(),
    last_used_at TIMESTAMPTZ                       -- updated async; useful for key hygiene
);

-- Index for fast hash lookup on every request
CREATE INDEX idx_api_keys_hash ON api_keys(key_hash) WHERE NOT revoked;

What is stored vs shown:

Zero-downtime rotation with this schema:

  1. Developer calls POST /apps/:app_id/keys with desired scopes. A new row is inserted in api_keys with a new key_hash. Both the old and new rows are present with revoked = false. The full new key is returned in the response body.
  2. Developer rolls out the new key across their services. During this period, both keys are valid — any request with either key's hash will be found and accepted by DB.lookup_by_hash(key_hash).
  3. Developer verifies via request logs (keyed off display_hint) that the old key is no longer receiving traffic.
  4. Developer calls DELETE /apps/:app_id/keys/:key_id (or the dashboard equivalent). The platform sets revoked = true, revoked_at = now(), revoked_by = 'user' on the old row. The partial index WHERE NOT revoked means future hash lookups no longer find it — immediate effect, no cache invalidation needed.

Rubric: Full marks for: correct table structure with the hash-not-plaintext choice justified, display hint as a separate column, show-once concept described, dual-key rotation explained via the revoked flag mechanism, and the partial index mentioned for performance. Partial marks for any three of five. Bonus: last_used_at for key hygiene reporting, revoked_by for audit trail.

Key takeaways

Sources & further reading