API Design

Design Case Studies · Lesson 18

Design: Slack API

Unlike consumer messaging apps, Slack is built around workspace hierarchies. An enterprise client with 100,000 members and thousands of channels presents a scaling profile where a simple connection boot can crash client apps, and broadcasting user presence creates O(N²) presence message cascades. Scaling Slack requires shifting from full synchronization to incremental subscription-driven APIs.

⏱ ~18 min Advanced Prereq: cs-08, caching, pagination

By the end you'll be able to

Requirements

Scaling a workspace-centric messaging platform introduces distinct constraints compared to point-to-point chat systems:

Design decisions

Transport: Connection Gateways & WebSocket Separation

To support millions of open TCP/TLS connections, Slack uses a dedicated gateway tier (such as Envoy proxies or custom Go/Edge proxies) to terminate WebSocket connections. These gateways do not hold application logic or state; they act as simple frame-forwarders. When a client reconnects, the gateway queries a distributed routing cache to locate the user's active session, then forwards incoming/outgoing frames. This protects internal database shards and application servers from direct exposure to TCP connection churn.

Boot Sync: Delta Syncing & Lazy Loading

In early iterations, Slack's `rtm.start` API returned the entire state of the workspace (all channels, members, history) in a single huge JSON block. For workspaces with over 10,000 users, this payload exceeded 50 MB, causing high latency, memory pressure, and client crashes.

The modern design uses a **Delta Sync API**. When the client boots or reconnects, it sends the timestamp of its last known state. The server returns only the differences (changes, additions, deletions). If the state is too stale, the client performs a bootstrap sync for the *sidebar content only* (subscribed channels and DMs), while lazy-loading members, profile cards, and channel archives as the user navigates the app.

Presence Service: Viewport-Driven Presence Subscriptions

In a workspace of 50,000 users, broadcasting every online/offline status change to all members generates $50,000 \times 50,000 = 2.5 \times 10^9$ updates — a massive performance bottleneck.

Slack solves this by shifting from global broadcast to **Presence Subscriptions**. The client app only subscribes to presence updates for users currently visible in the active sidebar viewport or active chat channel. As the user scrolls, the client dynamically unsubscribes from old IDs and subscribes to new ones over the WebSocket control channel, dropping presence delivery overhead by over 95%.

Shared Channels: Multi-Tenant Federation (Slack Connect)

Slack Connect allows two different organizations (Workspace A and Workspace B) to share a channel. To maintain tenant isolation, the channel exists as a federated resource. When a user in Workspace A posts a message, the API routes the request to Workspace A's regional databases. A background fan-out process detects the federated connection, duplicates the message, maps the sender's identity to a guest profile on Workspace B's domain, and pushes the event to Workspace B's message broker to deliver to Workspace B's active WebSocket sessions.

The API model

Establishing the Connection

Clients negotiate a WebSocket session by first requesting a gateway URL via a REST handshake. This allows load balancing and routing of the client to the closest edge datacenter.

# Request WebSocket gateway endpoint
POST /v1/apps.connections.open HTTP/1.1
Authorization: Bearer xoxb-token-123abc...
Content-Type: application/json

{
  "client_version": "desktop-4.32.0",
  "last_sync_timestamp": 1718908412
}

# Response returns gateway URL and synchronization payload
HTTP/1.1 200 OK
Content-Type: application/json

{
  "ok": true,
  "url": "wss://wss-edge-us-east.slack-edge.com/connection/v2/abc123xyz",
  "deltas": {
    "channels": {
      "added": ["C0123XYZ"],
      "removed": []
    },
    "users": {
      "updated": ["U999AAA"]
    }
  }
}

Presence Subscription Event

Once the WebSocket connection is open, the client sends subscription payloads to track presence updates for active viewports.

// Client → Server: subscribe to presence updates for visible users
{
  "type": "presence_sub",
  "ids": ["U111AAA", "U222BBB", "U333CCC"]
}

// Server → Client: status update for subscribed user
{
  "type": "presence_change",
  "user_id": "U222BBB",
  "presence": "active",
  "last_active": 1718908520
}

Under the hood: Slack connection scaling & pub/sub

To keep the chat experience fast, Slack decouples connection state from data writes. The Edge tier focuses on managing socket connections while the message routing tier routes events via localized Redis rings.

Clients Client A (WS) Client B (WS) Client C (WS) WebSocket Gateways WS-Gateway-1 (Client A & B) WS-Gateway-2 (Client C) Redis Pub/Sub Workspace-Channel Presence-Channel API Servers Message & Presence Services
Stateless gateways keep client WebSockets persistent. The API tier publishes updates to pub/sub channels. Gateways subscribe only to active workspaces and push events to respective client connections.

By the numbers: connection overhead & fan-out math

Let's evaluate the math behind scaling connections and presence changes inside a workspace of size $U$.

Governing Equations

Scenario Parameters

Worked Calculations: Sync & Fan-out Comparison

Metric Broadcast / Full Model Subscription / Lazy Model Reduction Factor
Workspace Boot Payload Size $U \times 800\text{ bytes} \approx 40\text{ MB}$ $V \times 800\text{ bytes} \approx 32\text{ KB}$ 1,250× decrease
Hourly Presence Updates $50,000 \times 49,999 \times 1.5 \approx 3.75\text{ Billion/hr}$ $50,000 \times 40 \times 1.5 = 3\text{ Million/hr}$ 1,250× decrease
Outbound Gateway Network rate $3.75\text{B}/3600 \times 100\text{ bytes} \approx 104\text{ MB/s}$ $3\text{M}/3600 \times 100\text{ bytes} \approx 83.3\text{ KB/s}$ 1,250× decrease
RAM for 10M Global Connections $10\text{M} \times 128\text{ KB} \approx 1.28\text{ TB}$ $10\text{M} \times 32\text{ KB} \approx 320\text{ GB}$ 4× decrease

Decision Math

When a channel size exceeds a threshold (e.g., $C > 2,000$), push-based typing updates and status changes are disabled. This bounds the maximum fan-out blast radius. The client falls back to **lazy polling** (checking status only when a user profile card is clicked) or **throttling push signals** to 10-second intervals.

How to debug & inspect it

To inspect active WebSocket traffic, use a command-line socket client like `wscat` to establish connection handshakes and intercept incoming frames.

# 1. Establish connection handshake to WebSocket Gateway $ wscat --connect wss://wss-edge-us-east.slack-edge.com/connection/v2/abc123xyz Connected (press CTRL+C to quit) # 2. Intercept incoming ping frame from gateway (keeping connection alive) < {"type": "ping", "reply_to": 12} # 3. Client responds with a pong to prevent timeout disconnection > {"type": "pong", "time": 1718908420} # 4. Subscribe to presence updates for users > {"type": "presence_sub", "ids": ["U222BBB"]} < {"type": "presence_change", "user_id": "U222BBB", "presence": "active"}

Use the guide below to resolve common failures in workspace-centric real-time APIs:

Symptom Likely Cause Fix
Client crashes on desktop load in enterprise organizations Bootstrap API payload is too large, loading entire user/channel list at once Shift to Delta Syncing API. Boot only with sidebar channels and lazy load user profiles.
Gateway CPU spikes when clients reconnect after network recovery Thundering herd/reconnect storm. Thousands of clients querying boot APIs at the same moment Implement randomized exponential backoff jitter on clients and separate Edge gateways from core DB.
Presence updates delayed by tens of seconds across the workspace Pub/sub channel saturation due to broadcast floods of active/away statuses Implement Viewport Presence Subscriptions; disable active pushes for channels containing > 2,000 users.

🧠 Quick check

1. Why did Slack move away from the monolithic `rtm.start` boot payload model?

In enterprise workspaces (10k+ members), returning all workspace metadata at once resulted in huge boot payloads (50MB+), creating memory constraints and slow startup times. Replacing it with sidebar-only bootstrapping and lazy loading solved the issue.

2. How does Viewport Presence Subscriptions reduce network traffic?

Instead of broadcasting every member's online/offline status changes to all workspace users (an O(N²) problem), clients subscribe only to presence changes for the subset of users currently visible in the UI viewport, reducing fan-out drastically.

3. Which architectural layer terminates client WebSocket connections in Slack's design?

Stateless connection gateways handle raw TCP/TLS and WebSocket connection state. This isolates core application servers and databases from connection churn and reconnect storms.

4. What fallback strategy is applied when a Slack channel has thousands of active members?

For large channels, fan-out write loads are minimized by disabling high-churn, non-essential real-time notifications (such as typing indicators and active presence changes) for the channel's membership list.

✍️ Exercise: design the delta-sync schema

An enterprise client has a workspace with 80,000 members and 5,000 channels. Their desktop application was closed for 3 hours and is now reconnecting.

Design the JSON payload format returned by the Delta Sync endpoint to sync this client. What information does the client send, and how does the server represent additions, updates, and deletes for channels and members?

Think through your answer before reading on.


Model answer:

To perform delta synchronization, the client must send its last known synchronization state timestamp (or token). The server queries changes that occurred after this timestamp and returns a structured patch document.

Client Request Payload:

{
  "workspace_id": "T0123ABCD",
  "last_sync_timestamp": 1718908412,
  "subscribed_channels": ["C0123XYZ", "C999888"]
}

Server Response Delta Document:

{
  "sync_timestamp": 1718919212,
  "channels": {
    "created_or_updated": [
      { "id": "C0123XYZ", "name": "announcements-new", "topic": "Updated topic text" }
    ],
    "deleted": ["C8888999"]
  },
  "members": {
    "updated": [
      { "id": "U12345", "presence": "active", "status_text": "Out for lunch" }
    ],
    "deleted": ["U999111"]
  }
}

Key Design requirements:

  1. Deleted IDs only: For deletions, only the array of raw IDs is returned to minimize bandwidth.
  2. Sidebar scoping: The server should filter user membership updates only to members of the active channels the client has opened/visible, rather than returning status updates for all 80,000 users.

Key takeaways

Sources & further reading

Original system design; these primary resources from Slack's engineering team describe the real implementation details: