Design Case Studies · Lesson 18
Design: Slack API
Unlike consumer messaging apps, Slack is built around workspace hierarchies. An enterprise client with 100,000 members and thousands of channels presents a scaling profile where a simple connection boot can crash client apps, and broadcasting user presence creates O(N²) presence message cascades. Scaling Slack requires shifting from full synchronization to incremental subscription-driven APIs.
By the end you'll be able to
- Explain why Slack's legacy monolithic boot payload (`rtm.start`) failed at scale and how Delta Sync APIs resolved it.
- Design a connection gateway routing architecture that isolates WebSocket connections from backend state servers.
- Describe the Presence Subscription model and calculate how it mitigates O(N²) presence message storms.
- Formulate trade-offs for handling large-channel fan-out (e.g., `#general` with 50k+ users) and design fallback paths.
Requirements
Scaling a workspace-centric messaging platform introduces distinct constraints compared to point-to-point chat systems:
- Workspace Segmentation. Multi-tenancy must be enforced at the API layer. Users belong to workspaces, which contain channels (public, private, shared/Slack Connect). Permissions must resolve instantly.
- Avoid Reconnect Storms. When thousands of workers reconnect simultaneously (e.g., after a network blip), client boot queries must not overwhelm database and memory caches.
- Presence Management. Status changes (active, away, custom status) must propagate quickly, but must not cause an exponential bandwidth flood in large organizations.
- Slack Connect. Secure cross-workspace channel sharing, requiring the mapping of identities and message deliveries across independent tenant spaces.
- Real-time Delivery. Chat messages, typing states, and read markers must deliver with an end-to-end latency budget under 150 ms.
Design decisions
Transport: Connection Gateways & WebSocket Separation
To support millions of open TCP/TLS connections, Slack uses a dedicated gateway tier (such as Envoy proxies or custom Go/Edge proxies) to terminate WebSocket connections. These gateways do not hold application logic or state; they act as simple frame-forwarders. When a client reconnects, the gateway queries a distributed routing cache to locate the user's active session, then forwards incoming/outgoing frames. This protects internal database shards and application servers from direct exposure to TCP connection churn.
Boot Sync: Delta Syncing & Lazy Loading
In early iterations, Slack's `rtm.start` API returned the entire state of the workspace (all channels, members, history) in a single huge JSON block. For workspaces with over 10,000 users, this payload exceeded 50 MB, causing high latency, memory pressure, and client crashes.
The modern design uses a **Delta Sync API**. When the client boots or reconnects, it sends the timestamp of its last known state. The server returns only the differences (changes, additions, deletions). If the state is too stale, the client performs a bootstrap sync for the *sidebar content only* (subscribed channels and DMs), while lazy-loading members, profile cards, and channel archives as the user navigates the app.
Presence Service: Viewport-Driven Presence Subscriptions
In a workspace of 50,000 users, broadcasting every online/offline status change to all members generates $50,000 \times 50,000 = 2.5 \times 10^9$ updates — a massive performance bottleneck.
Slack solves this by shifting from global broadcast to **Presence Subscriptions**. The client app only subscribes to presence updates for users currently visible in the active sidebar viewport or active chat channel. As the user scrolls, the client dynamically unsubscribes from old IDs and subscribes to new ones over the WebSocket control channel, dropping presence delivery overhead by over 95%.
Shared Channels: Multi-Tenant Federation (Slack Connect)
Slack Connect allows two different organizations (Workspace A and Workspace B) to share a channel. To maintain tenant isolation, the channel exists as a federated resource. When a user in Workspace A posts a message, the API routes the request to Workspace A's regional databases. A background fan-out process detects the federated connection, duplicates the message, maps the sender's identity to a guest profile on Workspace B's domain, and pushes the event to Workspace B's message broker to deliver to Workspace B's active WebSocket sessions.
The API model
Establishing the Connection
Clients negotiate a WebSocket session by first requesting a gateway URL via a REST handshake. This allows load balancing and routing of the client to the closest edge datacenter.
# Request WebSocket gateway endpoint
POST /v1/apps.connections.open HTTP/1.1
Authorization: Bearer xoxb-token-123abc...
Content-Type: application/json
{
"client_version": "desktop-4.32.0",
"last_sync_timestamp": 1718908412
}
# Response returns gateway URL and synchronization payload
HTTP/1.1 200 OK
Content-Type: application/json
{
"ok": true,
"url": "wss://wss-edge-us-east.slack-edge.com/connection/v2/abc123xyz",
"deltas": {
"channels": {
"added": ["C0123XYZ"],
"removed": []
},
"users": {
"updated": ["U999AAA"]
}
}
}
Presence Subscription Event
Once the WebSocket connection is open, the client sends subscription payloads to track presence updates for active viewports.
// Client → Server: subscribe to presence updates for visible users
{
"type": "presence_sub",
"ids": ["U111AAA", "U222BBB", "U333CCC"]
}
// Server → Client: status update for subscribed user
{
"type": "presence_change",
"user_id": "U222BBB",
"presence": "active",
"last_active": 1718908520
}
Under the hood: Slack connection scaling & pub/sub
To keep the chat experience fast, Slack decouples connection state from data writes. The Edge tier focuses on managing socket connections while the message routing tier routes events via localized Redis rings.
By the numbers: connection overhead & fan-out math
Let's evaluate the math behind scaling connections and presence changes inside a workspace of size $U$.
Governing Equations
- Total Connection Memory: The total RAM required to hold $N$ concurrent connections is: $$M_{total} = N \times (TCP_{buf} + TLS_{ctx} + WS_{state})$$
- Global Presence Broadcast: Broadcasting presence changes to every active user in a workspace with $U$ active users scales quadratically: $$Presence_{Broadcast} = U \times (U - 1) \times R$$ Where $R$ is the average status change rate per user per hour.
- Viewport Presence Subscription: If users subscribe only to visible sidebar profiles $V$ (where $V \ll U$): $$Presence_{Sub} = U \times V \times R$$
Scenario Parameters
- Workspace Size ($U$): 50,000 active users
- Active Sidebar Viewport Size ($V$): 40 profiles
- Status Change Rate ($R$): 1.5 changes per user per hour
- Memory Overhead per WebSocket Connection: 32 KB
Worked Calculations: Sync & Fan-out Comparison
| Metric | Broadcast / Full Model | Subscription / Lazy Model | Reduction Factor |
|---|---|---|---|
| Workspace Boot Payload Size | $U \times 800\text{ bytes} \approx 40\text{ MB}$ | $V \times 800\text{ bytes} \approx 32\text{ KB}$ | 1,250× decrease |
| Hourly Presence Updates | $50,000 \times 49,999 \times 1.5 \approx 3.75\text{ Billion/hr}$ | $50,000 \times 40 \times 1.5 = 3\text{ Million/hr}$ | 1,250× decrease |
| Outbound Gateway Network rate | $3.75\text{B}/3600 \times 100\text{ bytes} \approx 104\text{ MB/s}$ | $3\text{M}/3600 \times 100\text{ bytes} \approx 83.3\text{ KB/s}$ | 1,250× decrease |
| RAM for 10M Global Connections | $10\text{M} \times 128\text{ KB} \approx 1.28\text{ TB}$ | $10\text{M} \times 32\text{ KB} \approx 320\text{ GB}$ | 4× decrease |
Decision Math
When a channel size exceeds a threshold (e.g., $C > 2,000$), push-based typing updates and status changes are disabled. This bounds the maximum fan-out blast radius. The client falls back to **lazy polling** (checking status only when a user profile card is clicked) or **throttling push signals** to 10-second intervals.
How to debug & inspect it
To inspect active WebSocket traffic, use a command-line socket client like `wscat` to establish connection handshakes and intercept incoming frames.
Use the guide below to resolve common failures in workspace-centric real-time APIs:
| Symptom | Likely Cause | Fix |
|---|---|---|
| Client crashes on desktop load in enterprise organizations | Bootstrap API payload is too large, loading entire user/channel list at once | Shift to Delta Syncing API. Boot only with sidebar channels and lazy load user profiles. |
| Gateway CPU spikes when clients reconnect after network recovery | Thundering herd/reconnect storm. Thousands of clients querying boot APIs at the same moment | Implement randomized exponential backoff jitter on clients and separate Edge gateways from core DB. |
| Presence updates delayed by tens of seconds across the workspace | Pub/sub channel saturation due to broadcast floods of active/away statuses | Implement Viewport Presence Subscriptions; disable active pushes for channels containing > 2,000 users. |
🧠 Quick check
1. Why did Slack move away from the monolithic `rtm.start` boot payload model?
In enterprise workspaces (10k+ members), returning all workspace metadata at once resulted in huge boot payloads (50MB+), creating memory constraints and slow startup times. Replacing it with sidebar-only bootstrapping and lazy loading solved the issue.
2. How does Viewport Presence Subscriptions reduce network traffic?
Instead of broadcasting every member's online/offline status changes to all workspace users (an O(N²) problem), clients subscribe only to presence changes for the subset of users currently visible in the UI viewport, reducing fan-out drastically.
3. Which architectural layer terminates client WebSocket connections in Slack's design?
Stateless connection gateways handle raw TCP/TLS and WebSocket connection state. This isolates core application servers and databases from connection churn and reconnect storms.
4. What fallback strategy is applied when a Slack channel has thousands of active members?
For large channels, fan-out write loads are minimized by disabling high-churn, non-essential real-time notifications (such as typing indicators and active presence changes) for the channel's membership list.
✍️ Exercise: design the delta-sync schema
An enterprise client has a workspace with 80,000 members and 5,000 channels. Their desktop application was closed for 3 hours and is now reconnecting.
Design the JSON payload format returned by the Delta Sync endpoint to sync this client. What information does the client send, and how does the server represent additions, updates, and deletes for channels and members?
Think through your answer before reading on.
Model answer:
To perform delta synchronization, the client must send its last known synchronization state timestamp (or token). The server queries changes that occurred after this timestamp and returns a structured patch document.
Client Request Payload:
{
"workspace_id": "T0123ABCD",
"last_sync_timestamp": 1718908412,
"subscribed_channels": ["C0123XYZ", "C999888"]
}
Server Response Delta Document:
{
"sync_timestamp": 1718919212,
"channels": {
"created_or_updated": [
{ "id": "C0123XYZ", "name": "announcements-new", "topic": "Updated topic text" }
],
"deleted": ["C8888999"]
},
"members": {
"updated": [
{ "id": "U12345", "presence": "active", "status_text": "Out for lunch" }
],
"deleted": ["U999111"]
}
}
Key Design requirements:
- Deleted IDs only: For deletions, only the array of raw IDs is returned to minimize bandwidth.
- Sidebar scoping: The server should filter user membership updates only to members of the active channels the client has opened/visible, rather than returning status updates for all 80,000 users.
Key takeaways
- Workspace-scale real-time apps require **Delta Synchronization**; loading complete lists of users and channels causes client heap exhaustion and network congestion.
- Isolate WebSocket connection management by terminating connections at **stateless gateways**, protecting backend databases and stateful application code.
- Mitigate O(N²) presence message floods with **Viewport Presence Subscriptions**, matching status updates to active visual scroll boundaries.
- Scale group messages in large channels (e.g. #general) by **throttling or disabling** high-frequency ephemeral updates (like typing states).
- Support cross-organization channels by routing federated messages through background brokers that map foreign user profiles into tenant domains.
Sources & further reading
Original system design; these primary resources from Slack's engineering team describe the real implementation details: