Platform & API Product Engineering · Lesson 07

Platform & third-party auth (installed apps, Connect)

Basic OAuth lets one user authorize one app. But when a marketplace app is installed into 10,000 merchant stores, a new problem emerges: how does a single application credential authenticate against thousands of independent accounts — each with its own permission scope, its own revocation lifecycle, and its own blast radius?

⏱ 24 min Difficulty: advanced Prereq: OAuth (sec-06), Platform Keys (plat-06)

By the end you'll be able to

Explain the GitHub App credential chain: app JWT → installation access token, including why each is short-lived.
Describe the Stripe Connect model and articulate how "act on behalf of" works without a per-account token.
Design a token management strategy for 10,000 installs that avoids cold-start thundering herds and handles revocation cleanly.

Why this is different from basic OAuth

Standard OAuth (Lesson sec-06) solves the user-delegation problem: a single user grants your app access to their account and receives a token scoped to their data. The session is 1:1 — one user, one token, one account.

The installed-app model is fundamentally different. Think of a Shopify app that a merchant installs into their store. The same app binary, running on the same servers, needs to make authenticated API calls into 10,000 separate merchant accounts — each with different permissions granted during install, each capable of revoking independently at any time. The authorization is 1:N, not 1:1.

Three distinct patterns have emerged across the industry to solve this. They differ in where the credential lives, how tokens are scoped, and what "acting on behalf of" looks like on the wire.

Pattern A — the GitHub App model (per-install tokens)

GitHub Apps introduced a credential model that cleanly separates two identities: the app identity (who is this third-party application?) and the installation identity (which account did this specific install happen into?). These correspond to two separate tokens with different lifetimes and scopes.

The app JWT

The app holds an RSA private key (generated when creating the app, uploaded as a PEM). To prove it is the app, it mints a short-lived JWT signed with RS256:

# GitHub App JWT — payload structure
{
  "iss": "123456",          // your GitHub App's numeric ID
  "iat": 1750416540,       // issued-at, set 60 s in the PAST (clock-skew buffer)
  "exp": 1750417140        // issued-at + 600 s = 10-minute expiry (GitHub's hard max)
}

# Sign with RS256 using the PEM private key
import jwt
app_jwt = jwt.encode(payload, private_key_pem, algorithm="RS256")

The 10-minute lifetime is not arbitrary — it bounds how long a stolen app JWT is useful. GitHub verifies it using the public key you uploaded during app creation; because only you hold the private key, no one else can mint a valid app JWT.

The installation access token

The app JWT proves the app's identity, but it cannot call GitHub resources directly. To access a specific installation (a specific org or user that installed the app), the app exchanges its JWT for a per-install token:

# Exchange app JWT for an installation access token
POST https://api.github.com/app/installations/{installation_id}/access_tokens
Authorization: Bearer {app_jwt}
Accept: application/vnd.github+json

# Response — scoped to THIS installation only
{
  "token": "ghs_16C7e42F292c6912E7710c838347Ae178B4a",
  "expires_at": "2026-06-20T12:10:00Z",  // 1 hour from now
  "permissions": {
    "contents": "read",    // only what this installation granted
    "issues":   "write"
  },
  "repository_selection": "all"
}

The installation token carries only the permissions the user granted during installation — not the full set the app requested in its manifest. If the user granted contents:read but not pull_requests:write, the returned token reflects that. This makes the blast radius of a leaked token bounded to that installation's granted scope.

The 1-hour lifetime means your system must refresh tokens continuously. For 10,000 installs, that is a background process, not a one-time setup. See "By the numbers" below.

Pattern B — user OAuth tokens within an app

Installation tokens authenticate the app acting autonomously. But sometimes your app needs to act as a specific user — opening pull requests on their behalf, posting comments attributed to them, or accessing resources only that user can see. For this, GitHub Apps support a standard OAuth flow (sec-06) layered on top of the installation.

The user grants a user-level token via a redirect flow. That token carries the user's identity and only the permissions they consented to (which may be a subset of the installation's permissions). The key distinction: actions taken with a user token appear as that user in audit logs and comply with the user's own permission model, not the app's installation permissions. Use user tokens when the action should be attributed to a person; use installation tokens when the app is operating autonomously (scheduled syncs, webhooks, background jobs).

Pattern C — Stripe Connect (platform key + account header)

Stripe Connect takes a fundamentally different approach. Rather than issuing per-account tokens, Stripe lets your platform use its own secret key with a special header indicating which connected account to operate on:

# Stripe Connect — acting on behalf of a connected account
POST https://api.stripe.com/v1/charges
Authorization: Bearer sk_live_YourPlatformSecretKey
Stripe-Account: acct_1ExampleConnectedAcct
Content-Type: application/x-www-form-urlencoded

amount=2000&currency=usd&source=tok_visa

There is no token exchange, no per-account credential to manage, and no refresh cycle. Stripe's backend uses the Stripe-Account header to scope the operation to that connected account's data and settings. The charge appears in that account's dashboard, not yours.

The trade-off is blast radius: your platform secret key is a single credential that, if leaked, gives an attacker the ability to operate against every connected account on your platform. The power is proportional to the risk. Stripe mitigates this with restricted keys (plat-06), granular permissions on those restricted keys, and the requirement that connected accounts explicitly authorize the platform during onboarding.

Stripe Connect also supports a separate OAuth flow that issues access tokens per connected account, giving you a per-account token model similar to Pattern A — useful when you need the platform to be able to act only on explicitly authorized accounts with independently revocable access.

Consent screens and scoped permissions per install

Every installed-app pattern involves a consent moment. The platform (GitHub, Stripe, Slack) shows the user a screen listing exactly what the app is requesting: which repositories, which Stripe capabilities, which Slack channels. The user's choices at that moment define the permission envelope for the installation.

Well-designed consent screens follow the principle of minimal disclosure: request only the permissions the app genuinely needs at install time. Apps that request everything "just in case" see lower conversion rates and more revocations. GitHub even lets apps request permissions on demand — the app starts with minimal permissions and requests additional ones later, triggering a new consent prompt only when needed.

Each installation's permission set is stored by the platform and included in every token it issues for that installation. Your app cannot simply request broader permissions at token refresh time — the permissions in the returned token are bounded by what was consented to at install time, not what the app manifest lists.

Diagrams: the three authentication flows

Fig 1 — The install / consent flow. Every installation produces an installation_id that your app stores. Permissions are captured at consent time and cannot be silently expanded later.

Fig 2 — The app-JWT → installation-token exchange. The app JWT proves the app's identity (10 min lifetime); the installation token is the working credential (1 hr lifetime, per-install scope).

Fig 3 — Acting on behalf of, three ways. The choice controls blast radius on credential leak and the complexity of the token management lifecycle.

Consent screens and managing revocation/uninstall

When a user revokes an installation (uninstalls the app, disconnects their Stripe account, removes the Slack workspace), the platform immediately invalidates all tokens issued for that installation. Your app must handle this gracefully:

Listen for the webhook. GitHub sends an installation.deleted event; Stripe sends a account.application.deauthorized event. Slack sends an app_uninstalled event. Subscribe to these and treat them as authoritative.
Purge the token from your cache immediately. Do not wait for the next refresh cycle. A cached token for a revoked installation will return 401 errors, but you will also be attempting to call resources on behalf of an account that no longer wants you to.
Stop background jobs for that installation. Any scheduled sync, poll, or background task keyed to that installation_id must be halted. Failing to do this is a privacy violation if you continue reading data after authorization is withdrawn.
Persist the revocation event. Mark the installation as revoked in your database so you do not accidentally re-add it if a stale message arrives out of order.

Under the hood: the app-JWT → installation-token exchange, step by step

Here is the exact sequence a well-implemented GitHub App server executes. Every step has a failure mode worth knowing.

# Step 1 — build the JWT payload now = int(time.time()) # 1750416600 payload = { "iss": "123456", # your GitHub App numeric ID "iat": now - 60, # 60 s in the past — clock-skew buffer "exp": now + 540 # 9 min from now (10 min is GitHub's hard max; stay under) } # Step 2 — sign with RS256 using your PEM private key app_jwt = jwt.encode(payload, PRIVATE_KEY_PEM, algorithm="RS256") # Result: eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiIxMjM0NTYi... # Step 3 — exchange for an installation access token POST https://api.github.com/app/installations/9876543/access_tokens Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9... Accept: application/vnd.github+json # Step 4 — GitHub's internal verification # a) Decode JWT header → alg=RS256 # b) Extract iss=123456 → look up app by ID → retrieve public key # c) Verify RS256 signature against public key # d) Check exp > now() and iat <= now() + 60 (allow skew) # e) Check installation 9876543 belongs to app 123456 # Step 5 — GitHub returns the installation token HTTP/2 201 { "token": "ghs_16C7e42F292c6912E7710c838347Ae178B4a", "expires_at": "2026-06-20T11:10:00Z", // exactly 1 hour from now "permissions": { "contents": "read", "issues": "write" }, "repository_selection": "all" } # Step 6 — cache and use SETEX "token:install:9876543" 3500 "ghs_16C7e42F292c6912E7710c838347Ae178B4a" # TTL = 3500 s (100 s buffer before the 3600 s expiry) # Step 7 — use the installation token for API calls GET https://api.github.com/repos/customer-org/their-repo/contents/ Authorization: Bearer ghs_16C7e42F292c6912E7710c838347Ae178B4a HTTP/2 200 { "name": "README.md", ... } # Common failure: expired app JWT HTTP/2 401 { "message": "Expiration time (exp) has passed" } → regenerate the app JWT; app JWT should be minted fresh for each exchange, never reused or cached for long periods.

Pros and cons: the three patterns

Pattern	Token volume	Blast radius on leak	On-behalf-of clarity	Revocation	Complexity
(A) GitHub App — per-install token	1 token per active install per hour; 10,000 installs = ~10,000 tokens in rotation	Low — one install's data only; 1-hr TTL auto-expires	Clear — the token is explicitly scoped to one installation's permissions	Platform invalidates immediately on uninstall; webhook notifies app	High — background refresh loop, cache, JWT signing, revocation listener
(B) User OAuth token	1 token per active user session; may overlap with install tokens	Low — one user's accessible resources only	Very clear — actions attributed to the user in audit logs	User can revoke via platform settings; refresh token approach controls session length	Medium — standard OAuth flow; refresh token management
(C) Stripe Connect — platform key + header	One credential (platform secret key) for all accounts; zero per-account tokens	Very high — a leaked platform key compromises every connected account	Less clear — requests appear as the platform, not a distinct per-account token	Connected account revokes platform access; platform key itself revoked centrally	Low — no token exchange, no refresh cycle, just a header

How real platforms do it

GitHub Apps pioneered the two-tier JWT model described in Pattern A. App JWTs are strictly 10 minutes; installation tokens are 1 hour. GitHub surfaces the per-install permission set in the token response and fires installation.deleted webhooks on revocation. See GitHub's JWT generation docs and authenticating as an installation.

Stripe Connect offers two modes: the platform-key-plus-header model for tight platform control, and an OAuth-based model where each connected account issues the platform an access token with explicit scopes — useful when you need independently revocable per-account credentials rather than a single platform key. See Stripe Connect authentication docs.

Slack apps use workspace-scoped OAuth tokens (Pattern B style) — one token per workspace installation. Slack's token rotation feature lets apps cycle tokens without downtime. The app_uninstalled event fires when a workspace removes the app. See Slack OAuth V2 docs.

Google Workspace Marketplace apps use service account credentials with domain-wide delegation — the service account can impersonate any user in the domain, making it conceptually similar to Pattern C (high blast radius, zero per-user token overhead). See Google Workspace Marketplace auth guide.

By the numbers: managing tokens for 10,000 installs

Installation tokens expire after 1 hour. To keep all 10,000 installs operational, your service must continuously refresh tokens as they approach expiry.

Steady-state refresh rate (modeled):

token_lifetime = 3600 # seconds (1 hour) buffer_before_expiry= 200 # seconds (refresh when 200 s remain) effective_lifetime = 3600 - 200 = 3400 # seconds before refresh needed refresh_qps = installs / effective_lifetime = 10,000 / 3400 ≈ 2.94 refreshes/second # (modeled) # With 15% random jitter spread: peak ≈ 3.4 req/s # GitHub's rate limit on app-installation-token requests: 5,000/hr = 1.39/s per app # → with 10,000 installs you need to stay under the per-app rate limit # → stagger refresh: don't refresh all at once, spread over the buffer window

Storage: Redis cache for all tokens (modeled):

10,000 installs × ~300 bytes/token (token string + metadata) = ~3 MB in Redis Negligible — Redis can hold millions of such keys comfortably # Each key: SETEX "token:install:{id}" 3400 "{token_json}" # TTL ensures stale entries auto-expire even if your refresh job crashes

The cold-start thundering-herd problem (modeled):

# Your service restarts with an empty cache. All 10,000 installs need fresh tokens. # GitHub's limit: 5,000 installation-token requests/hour/app # → 10,000 / 5,000 = 2 hours to fully warm the cache if you fire all at once # → At the 5,000-req/hr limit: 10,000 installs take ~2 hours to warm (modeled) # Solution: startup jitter — don't refresh all immediately for install_id in all_installs: jitter = random.uniform(0, warm_up_window_seconds) # e.g. 0-600 s schedule_refresh(install_id, delay=jitter) # This spreads 10,000 requests over 10 minutes: ~16.7 req/s peak # Well within rate limits even with 4 app servers sharing the load

Decision math — refresh buffer tradeoff: A small buffer (30 s before expiry) minimizes wasted refreshes on long-idle installs, but leaves a tight window for retry if the refresh fails. A large buffer (300 s) wastes ~8% of token lifetime on preemptive refreshes, but gives 5 retry attempts at 60-second intervals before the token actually expires. For most systems, 120–300 s is the right range. Label these numbers as heuristics and tune to your observed p99 refresh latency.

🎯 Interview angle

"Describe how a GitHub App authenticates on behalf of an installation." The complete answer has two tiers: the app JWT (RS256, 10-min, proves app identity) and the installation access token (1-hr, per-install scope, retrieved by exchanging the app JWT). Then cover operations: cache the installation token, refresh before expiry, listen for revocation webhooks, handle cold-start thundering herd with jitter. Candidates who only describe OAuth miss the two-tier structure that makes GitHub Apps different from user-OAuth delegation.

⚠️ Common trap

Checking token existence in cache, not token expiry. An installation token may be in your Redis cache but 59 minutes old. On the next request you serve it; it expires 60 seconds later mid-flight. Always store expires_at alongside the token and check expires_at - now() > buffer before serving the cached value. A missing expiry check turns a 3,600-second lifetime into an "occasionally fails mysteriously" bug that's hard to reproduce in staging.

✅ Do this, not that

Store the installation_id alongside the token, always. The token is what you use to call the API. The installation_id is what you use to refresh the token when it expires. If your cache stores only the token string (no installation_id, no expires_at), you have no way to refresh — you must re-run the entire install flow to get a new token. Store the full object: {token, expires_at, installation_id, permissions}. The extra 50 bytes per install is trivial; the operational capability is not.

🧠 Quick check

1. In the GitHub App model, the app JWT is signed with:

App JWTs are RS256-signed with the private RSA key you generate and register with GitHub. GitHub verifies the signature using the corresponding public key stored in the app configuration. The private key never leaves your server — this asymmetry is what allows GitHub to verify without holding a shared secret.

2. In the Stripe Connect model, how does your platform act on behalf of a connected account?

Stripe Connect uses your platform's own secret key for authentication; the Stripe-Account header tells Stripe which connected account's data and settings to operate on. No per-account token exchange is required — the trade-off is that the platform key carries very high blast radius if leaked.

3. A GitHub App installation access token is valid for:

Installation tokens have a 1-hour lifetime. The app JWT used to obtain them expires after 10 minutes. The two-tier design means a leaked installation token auto-expires after at most 1 hour, and a leaked app JWT is useless after 10 minutes — both intentionally short to bound the blast radius of a credential leak.

4. Your service restarts cold with 10,000 installations in a database but an empty token cache. What is the safest approach to warming the cache?

Fetching all tokens immediately creates a thundering herd: 10,000 simultaneous requests against GitHub's rate-limited token endpoint. Lazy refresh with startup jitter (random delay per install) spreads the load across the first several minutes of operation. Pre-generating tokens 24 hours in advance would produce expired tokens before the deploy window. A single shared token would be scoped to only one installation.

✍️ Exercise: design the token management system for a GitHub App with 5,000 installs

You are building a data-sync service that uses a GitHub App to read repository metadata across 5,000 customer organizations. The service runs on 4 application servers and syncs data every 15 minutes per installation.

Design the complete token caching and refresh strategy. Address: cache key schema, TTL, how refresh is triggered, what happens on cold start, how multiple servers avoid refreshing the same token simultaneously, and what happens when a refresh fails.

Model answer:

Cache: Redis HASH or STRING per install: key = token:install:{installation_id}, value = JSON {"token": "ghs_xxx", "expires_at": "...", "permissions": {...}}, TTL = 3,400 s (100 s buffer).
Refresh trigger: Before using a cached token, check expires_at - now() < 120 s. If true, refresh. Background sweeper thread wakes every 60 s and refreshes any token expiring within 5 minutes.
Cold start: On startup, do NOT fetch all 5,000 tokens immediately. Use lazy load — fetch the token only when the first sync job for that installation fires. Sync jobs are staggered by design (different customers at different offsets), so the cold-start load naturally spreads over the first 15-minute sync cycle.
Per-install locking: Use a Redis distributed lock (SET token:lock:{id} 1 NX PX 30000) before refreshing. If the lock is held, another server is already refreshing — wait and then re-read the cache. This prevents 4 servers from all refreshing the same token simultaneously, consuming 4× the rate-limit budget.
Failure handling: If a refresh returns 401 (installation revoked), mark the install as revoked in the database, emit an alert, and skip that installation in future sync runs. If it returns a transient 5xx, retry with exponential backoff (1s, 2s, 4s) before failing the sync job for that cycle.

Rubric: Full marks for covering (a) TTL with expiry buffer, (b) check-before-use expiry check, (c) background refresh sweep, (d) cold-start lazy loading (not thundering herd), (e) per-install distributed locking, (f) 401 revocation handling vs transient 5xx retry.

Key takeaways

The installed-app model differs from user OAuth because one app credential must operate across thousands of independent accounts, each with its own permission scope and revocation lifecycle — a 1:N problem, not 1:1.
GitHub Apps use a short-lived app JWT (RS256, 10 min) to obtain per-install tokens (1 hr, scoped to that install's granted permissions). The two-tier design keeps blast radius small at both credential levels.
Stripe Connect uses your platform secret key plus a Stripe-Account header — no per-account token exchange needed. The simplicity is real, but so is the risk: the platform key controls all connected accounts simultaneously.
At 10,000 installs, steady-state token refresh generates roughly 3 refreshes/second — manageable. The danger is cold start: a cache-empty restart must not fire all 10,000 exchanges at once. Stagger with jitter.
Revocation handling is a first-class concern: when a user uninstalls, purge the token immediately, stop all background jobs for that install, and persist the revocation event to prevent stale replays.