API Design

Design Case Studies · Lesson 19

Design: Collaborative Editing API

Collaborative applications like Google Docs and Figma allow multiple users to edit the same document concurrently. Under the hood, this requires resolving conflicts deterministically across geo-distributed clients. We must design a robust API and synchronization model using Operational Transformation (OT) or Conflict-Free Replicated Data Types (CRDTs) to prevent data loss and document divergence.

⏱ ~18 min Advanced Prereq: websockets, cs-08, pub/sub

By the end you'll be able to

Requirements

Designing a collaborative editing system requires addressing distinct networking and data-integrity constraints:

Design decisions

Transport Protocol: WebSockets vs. SSE vs. HTTP/2

Since editing requires bidirectional, low-latency, real-time message streams, WebSockets are the standard transport choice. Clients establish a persistent connection to a session gateway server. WebSockets avoid HTTP header overhead on high-frequency keystroke messages. For read-only viewers, Server-Sent Events (SSE) or HTTP/2 streams are highly efficient alternatives to conserve gateway socket memory.

Operational Transformation (OT) vs. CRDTs

The choice of conflict resolution model determines your API structure and backend storage requirements:

Metric Operational Transformation (OT) Conflict-Free Replicated Data Types (CRDTs)
How it works Edits represent operations (inserts/deletes) transformed by the server relative to a global sequence number. Data structures are designed so operations are mathematically commutative (order-independent).
State & Authority Centralized. The server is the single source of truth for sequence ordering and transformation rules. Decentralized. Clients can merge edits in any order, and their local states will naturally converge.
Metadata Overhead Low. Operations contain simple index offsets and string contents. High. Every character requires unique IDs, vector clocks, and tombstone markers (takes up to 5x RAM).
Best for Plain-text documents, standard document editors (e.g., Google Docs). Graph-based canvases, structured layout editors, decentralized apps (e.g., Figma, Notion).

Synchronization & Versioning Loops

To keep edit sizes small, we use **Delta Synchronization**. Clients track their local revision index (e.g., rev=42). When sending an edit, they attach their base revision. The server checks the global sequence. If another edit was accepted first (e.g., global sequence is now 43), the server transforms the incoming edit to account for the changes in version 43, commits the transformed edit as version 44, and broadcasts it to all other clients.

The API model

Establishing editing session

Before initiating the WebSocket loop, the client makes a REST request to obtain the current snapshot of the document and locate the target session server.

# Request document snapshot and session token
GET /v1/documents/doc_abc123/session HTTP/1.1
Authorization: Bearer user-token-xyz

# Response returns version metadata and gateway WebSocket URL
HTTP/1.1 200 OK
Content-Type: application/json

{
  "document_id": "doc_abc123",
  "revision": 1482,
  "content": "Hello, world!",
  "websocket_url": "wss://coll-edge-eu.example.com/session/doc_abc123"
}

WebSocket Edit Event (OT Model)

When the client types, it sends an operations array over the WebSocket connection. The server validates the base revision, applies transformations if needed, assigns the next global revision, and broadcasts it.

// Client → Server: User inserts "!" at index 12 based on revision 1482
{
  "type": "apply_op",
  "base_revision": 1482,
  "ops": [
    { "type": "insert", "index": 12, "value": "!" }
  ]
}

// Server → Client (Broadcast): Broadcasts the accepted operation
{
  "type": "op_broadcast",
  "revision": 1483,
  "client_id": "client_alice",
  "ops": [
    { "type": "insert", "index": 12, "value": "!" }
  ]
}

Under the hood: OT synchronization loop

Operational Transformation utilizes a centralized server to serialize and transform incoming edits so that client states do not drift.

Client A Server (OT Broker) Client B "ABC" (v0) "ABC" (v0) "ABC" (v0) Insert('x', 1) v0 Insert('y', 2) v0 Commit A (v1) Broadcast A (v1) Transform: index 2 → 3 Commit B (v2) Broadcast Insert('y', 3) Ack Insert('y', 3) "AXBCY" (v2) "AXBCY" (v2) "AXBCY" (v2)
Concurrency resolution in OT: Client B's operation was generated at version 0, targeting index 2. Since Client A's edit was processed first, the server transformed Client B's edit to target index 3, guaranteeing document convergence.

By the numbers: edit frequency & fan-out math

Let's evaluate the network capacity requirements for a real-time collaborative doc edit session.

Governing Equations

Scenario Parameters

Worked Calculations: Impact of Client-Side Buffering

Metric Unbuffered / Character-by-character Buffered / Batched (200ms) Reduction Factor
Messages Sent per Client / sec $K = 5\text{ msgs/s}$ $1 \div 0.2 = 5\text{ msgs/s}$ (same rate per client) No change
Total Server Inbound Rate $N \times K = 20 \times 5 = 100\text{ msgs/s}$ $N \times 5 = 100\text{ msgs/s}$ No change
Total Server Fan-out Rate (Outbound) $20 \times 19 \times 5 = 1,900\text{ msgs/s}$ $20 \times 19 \times 5 = 1,900\text{ msgs/s}$ (at 20 writers) No change
Egress rate at N = 100 Writers $100 \times 99 \times 5 = 49,500\text{ msgs/s}$ $100 \times 99 \times (1 \div 0.2) = 49,500\text{ msgs/s}$ Quadratic growth scales similarly
🧮 Step-by-step scaling math: the 100-writer boundary

Let's calculate the bandwidth required for the session server if 100 users are typing in the same document concurrently (unbuffered vs. buffered):

  1. Character-by-character (Unbuffered) Fan-out: $$Msg_{rate} = 100 \text{ typers} \times 99 \text{ recipients} \times 5 \text{ keystrokes/sec} = 49,500 \text{ msgs/sec}$$ $$Egress\text{ Bandwidth} = 49,500 \text{ msgs/sec} \times 300 \text{ bytes/msg} \approx 14.85 \text{ MB/s} \approx 118.8 \text{ Mbps}$$ This consumes a significant portion of a 1 Gbps server network connection for just one document!
  2. With Server-Side Grouping & Throttle (100ms sync frames):
    Instead of immediate forwarding, the server gathers edits and broadcasts a single consolidated patch frame to all clients 10 times a second ($T_{broadcast} = 100ms$). $$Msg_{rate} = 100 \text{ clients} \times 10 \text{ updates/sec} = 1,000 \text{ msgs/sec}$$ $$Egress\text{ Bandwidth} = 1,000 \text{ msgs/sec} \times 800 \text{ bytes (larger batched payload)} = 800 \text{ KB/s} \approx 6.4 \text{ Mbps}$$ This reduces network egress by over 95%, ensuring the server can support hundreds of documents simultaneously on the same hardware.

How to debug & inspect it

To inspect real-time collaborative edits, capture and view active WebSocket frames in Google Chrome DevTools (Network tab → WS) or use `wscat` to watch operational payload payloads.

# 1. Connect to Collaborative Session Gateway $ wscat --connect wss://coll-edge-eu.example.com/session/doc_abc123 Connected (press CTRL+C to quit) # 2. Receive session confirmation and initial revision state < {"type": "session_init", "client_id": "client_bob", "revision": 1482} # 3. Watch incoming real-time edit operations broadcast by others < {"type": "op_broadcast", "revision": 1483, "client_id": "client_alice", "ops": [{"type": "insert", "index": 5, "value": "x"}]}

Use the guide below to diagnose and resolve synchronization problems in real-time collaborative systems:

Symptom Likely Cause Fix
Client displays mixed-up text ("axbc" instead of "abxc") during concurrent typings Server failed to transform operations correctly (OT bug) or client applied them out of order Validate the transformation matrix code; verify all client edits use strict global sequence versions.
Stale clients disconnected for minutes fail to reconnect and save changes Revision gap is too large for delta sync. Server discarded historical operations needed to transform their edits If client is too far behind, force a full document reload/re-merge rather than delta transformation.
Client memory spikes and the app lags as more edits are made over time Tombstone metadata accumulation in CRDTs (deleted items are kept in memory to maintain structure) Implement periodic document garbage collection/compaction to clean up historical tombstones.

🧠 Quick check

1. What is the fundamental difference between Operational Transformation (OT) and CRDTs?

OT depends on a central authority (the server) to order operations and transform offsets. CRDTs are mathematically designed to be order-independent, allowing decentralized clients to converge to the same state without a coordinator.

2. Why does unthrottled real-time editing scale poorly as the number of typers increases?

If every keystroke is immediately broadcast to all typers, the outbound network message rate spikes quadratically ($N \times (N-1) \times K$), overloading server network interfaces.

3. What is a "tombstone" in a CRDT-based collaborative system?

Because CRDTs need to reconstruct structural history and resolve concurrent inserts without a central coordinator, deleted elements are not purged immediately. Instead, they are marked as "tombstones" to maintain spatial coordinates, causing metadata overhead.

4. In an OT system, what should the server do if a client submits an edit with an outdated base revision?

The server transforms the incoming operation's offset indices based on the history of commits that occurred between the client's outdated revision and the server's current global version.

✍️ Exercise: design the OT transform function

Two users concurrently edit the document "API".

Assume the server processes User A's operation first. Write out the transformation logic the server must apply to User B's operation when it arrives, and show the final converged string.


Model answer:

When User A's operation is processed first, the document changes from "API" to "SAPI" (Revision 1). The server commits Insert("S", 0).

When User B's operation Insert("D", 3) arrives, its base revision is 0 (it was created before User B knew about User A's change). The server must transform User B's operation against User A's committed operation:

  1. User A inserted a character at index 0.
  2. User B's insert index (3) is greater than User A's insert index (0).
  3. Therefore, User B's index must be shifted right by the length of User A's insert (1 character).
  4. The transformed operation becomes: Insert("D", 4).

Applying the transformed operation to the current state "SAPI" results in: "SAPID".

When User A receives the broadcast of User B's edit, it applies Insert("D", 4) to its local "SAPI", converging to "SAPID". Both clients arrive at the exact same string.

Key takeaways

Sources & further reading