Design Case Studies · Lesson 19
Design: Collaborative Editing API
Collaborative applications like Google Docs and Figma allow multiple users to edit the same document concurrently. Under the hood, this requires resolving conflicts deterministically across geo-distributed clients. We must design a robust API and synchronization model using Operational Transformation (OT) or Conflict-Free Replicated Data Types (CRDTs) to prevent data loss and document divergence.
By the end you'll be able to
- Contrast Operational Transformation (OT) and Conflict-Free Replicated Data Types (CRDTs) for conflict resolution.
- Design the API endpoints and JSON payload structure for operation-based synchronization loops.
- Compute keystroke broadcast fan-out bandwidth rates and model the savings from client-side buffering/batching.
- Formulate a vector clock or sequence numbering schema to track revision histories and catch up stale clients.
Requirements
Designing a collaborative editing system requires addressing distinct networking and data-integrity constraints:
- Consistency. All clients editing a document must converge to the exact same state once all edits are processed. Divergence (where users see different versions) is unacceptable.
- Low Latency. Local edits must render immediately on the editor's screen (sub-10ms UI response) and propagate to other users in under 100ms.
- Offline Support. A user must be able to continue editing during brief network disconnects and merge changes cleanly upon reconnecting.
- Scale. The API must handle document sessions with dozens of active simultaneous typers and thousands of concurrent read-only viewers.
Design decisions
Transport Protocol: WebSockets vs. SSE vs. HTTP/2
Since editing requires bidirectional, low-latency, real-time message streams, WebSockets are the standard transport choice. Clients establish a persistent connection to a session gateway server. WebSockets avoid HTTP header overhead on high-frequency keystroke messages. For read-only viewers, Server-Sent Events (SSE) or HTTP/2 streams are highly efficient alternatives to conserve gateway socket memory.
Operational Transformation (OT) vs. CRDTs
The choice of conflict resolution model determines your API structure and backend storage requirements:
| Metric | Operational Transformation (OT) | Conflict-Free Replicated Data Types (CRDTs) |
|---|---|---|
| How it works | Edits represent operations (inserts/deletes) transformed by the server relative to a global sequence number. | Data structures are designed so operations are mathematically commutative (order-independent). |
| State & Authority | Centralized. The server is the single source of truth for sequence ordering and transformation rules. | Decentralized. Clients can merge edits in any order, and their local states will naturally converge. |
| Metadata Overhead | Low. Operations contain simple index offsets and string contents. | High. Every character requires unique IDs, vector clocks, and tombstone markers (takes up to 5x RAM). |
| Best for | Plain-text documents, standard document editors (e.g., Google Docs). | Graph-based canvases, structured layout editors, decentralized apps (e.g., Figma, Notion). |
Synchronization & Versioning Loops
To keep edit sizes small, we use **Delta Synchronization**. Clients track their local revision index (e.g., rev=42). When sending an edit, they attach their base revision. The server checks the global sequence. If another edit was accepted first (e.g., global sequence is now 43), the server transforms the incoming edit to account for the changes in version 43, commits the transformed edit as version 44, and broadcasts it to all other clients.
The API model
Establishing editing session
Before initiating the WebSocket loop, the client makes a REST request to obtain the current snapshot of the document and locate the target session server.
# Request document snapshot and session token
GET /v1/documents/doc_abc123/session HTTP/1.1
Authorization: Bearer user-token-xyz
# Response returns version metadata and gateway WebSocket URL
HTTP/1.1 200 OK
Content-Type: application/json
{
"document_id": "doc_abc123",
"revision": 1482,
"content": "Hello, world!",
"websocket_url": "wss://coll-edge-eu.example.com/session/doc_abc123"
}
WebSocket Edit Event (OT Model)
When the client types, it sends an operations array over the WebSocket connection. The server validates the base revision, applies transformations if needed, assigns the next global revision, and broadcasts it.
// Client → Server: User inserts "!" at index 12 based on revision 1482
{
"type": "apply_op",
"base_revision": 1482,
"ops": [
{ "type": "insert", "index": 12, "value": "!" }
]
}
// Server → Client (Broadcast): Broadcasts the accepted operation
{
"type": "op_broadcast",
"revision": 1483,
"client_id": "client_alice",
"ops": [
{ "type": "insert", "index": 12, "value": "!" }
]
}
Under the hood: OT synchronization loop
Operational Transformation utilizes a centralized server to serialize and transform incoming edits so that client states do not drift.
By the numbers: edit frequency & fan-out math
Let's evaluate the network capacity requirements for a real-time collaborative doc edit session.
Governing Equations
- Unbuffered Broadcast rate: If $N$ active writers are typing concurrently at keystroke rate $K$ (keystrokes/sec), and the server broadcasts every single edit to all $N$ writers: $$Msg_{rate} = N \times (N - 1) \times K$$ This scales quadratically: $O(N^2)$.
- Client-Side Buffering: If clients batch keystrokes and transmit edits at interval $T$ (seconds) rather than per-character: $$Msg_{rate} = N \times (N - 1) \times \frac{1}{T}$$
- Network Egress Bandwidth: The total egress data rate at the session server is: $$Bandwidth_{egress} = Msg_{rate} \times Payload_{size}$$
Scenario Parameters
- Active Typers ($N$): 20 concurrent users
- Keystroke Rate ($K$): 5 keystrokes per second
- Client Buffering Window ($T$): 200 ms (0.2s)
- Average JSON Payload size: 300 bytes
Worked Calculations: Impact of Client-Side Buffering
| Metric | Unbuffered / Character-by-character | Buffered / Batched (200ms) | Reduction Factor |
|---|---|---|---|
| Messages Sent per Client / sec | $K = 5\text{ msgs/s}$ | $1 \div 0.2 = 5\text{ msgs/s}$ (same rate per client) | No change |
| Total Server Inbound Rate | $N \times K = 20 \times 5 = 100\text{ msgs/s}$ | $N \times 5 = 100\text{ msgs/s}$ | No change |
| Total Server Fan-out Rate (Outbound) | $20 \times 19 \times 5 = 1,900\text{ msgs/s}$ | $20 \times 19 \times 5 = 1,900\text{ msgs/s}$ (at 20 writers) | No change |
| Egress rate at N = 100 Writers | $100 \times 99 \times 5 = 49,500\text{ msgs/s}$ | $100 \times 99 \times (1 \div 0.2) = 49,500\text{ msgs/s}$ | Quadratic growth scales similarly |
Let's calculate the bandwidth required for the session server if 100 users are typing in the same document concurrently (unbuffered vs. buffered):
- Character-by-character (Unbuffered) Fan-out: $$Msg_{rate} = 100 \text{ typers} \times 99 \text{ recipients} \times 5 \text{ keystrokes/sec} = 49,500 \text{ msgs/sec}$$ $$Egress\text{ Bandwidth} = 49,500 \text{ msgs/sec} \times 300 \text{ bytes/msg} \approx 14.85 \text{ MB/s} \approx 118.8 \text{ Mbps}$$ This consumes a significant portion of a 1 Gbps server network connection for just one document!
- With Server-Side Grouping & Throttle (100ms sync frames):
Instead of immediate forwarding, the server gathers edits and broadcasts a single consolidated patch frame to all clients 10 times a second ($T_{broadcast} = 100ms$). $$Msg_{rate} = 100 \text{ clients} \times 10 \text{ updates/sec} = 1,000 \text{ msgs/sec}$$ $$Egress\text{ Bandwidth} = 1,000 \text{ msgs/sec} \times 800 \text{ bytes (larger batched payload)} = 800 \text{ KB/s} \approx 6.4 \text{ Mbps}$$ This reduces network egress by over 95%, ensuring the server can support hundreds of documents simultaneously on the same hardware.
How to debug & inspect it
To inspect real-time collaborative edits, capture and view active WebSocket frames in Google Chrome DevTools (Network tab → WS) or use `wscat` to watch operational payload payloads.
Use the guide below to diagnose and resolve synchronization problems in real-time collaborative systems:
| Symptom | Likely Cause | Fix |
|---|---|---|
| Client displays mixed-up text ("axbc" instead of "abxc") during concurrent typings | Server failed to transform operations correctly (OT bug) or client applied them out of order | Validate the transformation matrix code; verify all client edits use strict global sequence versions. |
| Stale clients disconnected for minutes fail to reconnect and save changes | Revision gap is too large for delta sync. Server discarded historical operations needed to transform their edits | If client is too far behind, force a full document reload/re-merge rather than delta transformation. |
| Client memory spikes and the app lags as more edits are made over time | Tombstone metadata accumulation in CRDTs (deleted items are kept in memory to maintain structure) | Implement periodic document garbage collection/compaction to clean up historical tombstones. |
🧠 Quick check
1. What is the fundamental difference between Operational Transformation (OT) and CRDTs?
OT depends on a central authority (the server) to order operations and transform offsets. CRDTs are mathematically designed to be order-independent, allowing decentralized clients to converge to the same state without a coordinator.
2. Why does unthrottled real-time editing scale poorly as the number of typers increases?
If every keystroke is immediately broadcast to all typers, the outbound network message rate spikes quadratically ($N \times (N-1) \times K$), overloading server network interfaces.
3. What is a "tombstone" in a CRDT-based collaborative system?
Because CRDTs need to reconstruct structural history and resolve concurrent inserts without a central coordinator, deleted elements are not purged immediately. Instead, they are marked as "tombstones" to maintain spatial coordinates, causing metadata overhead.
4. In an OT system, what should the server do if a client submits an edit with an outdated base revision?
The server transforms the incoming operation's offset indices based on the history of commits that occurred between the client's outdated revision and the server's current global version.
✍️ Exercise: design the OT transform function
Two users concurrently edit the document "API".
- User A inserts "S" at index 0 (intending to make it "SAPI").
- User B inserts "D" at index 3 (intending to make it "APID").
Assume the server processes User A's operation first. Write out the transformation logic the server must apply to User B's operation when it arrives, and show the final converged string.
Model answer:
When User A's operation is processed first, the document changes from "API" to "SAPI" (Revision 1). The server commits Insert("S", 0).
When User B's operation Insert("D", 3) arrives, its base revision is 0 (it was created before User B knew about User A's change). The server must transform User B's operation against User A's committed operation:
- User A inserted a character at index 0.
- User B's insert index (3) is greater than User A's insert index (0).
- Therefore, User B's index must be shifted right by the length of User A's insert (1 character).
- The transformed operation becomes:
Insert("D", 4).
Applying the transformed operation to the current state "SAPI" results in: "SAPID".
When User A receives the broadcast of User B's edit, it applies Insert("D", 4) to its local "SAPI", converging to "SAPID". Both clients arrive at the exact same string.
Key takeaways
- **Consistency in real-time** collaborative editing requires mathematical convergence models (OT or CRDTs) to ensure clients stay in sync.
- **Operational Transformation (OT)** is centralized and matches text editors (Google Docs), keeping metadata footprint low by transforming indices on the server.
- **CRDTs** are decentralized, resolving conflicts locally on client machines. They are ideal for canvas/layout editors (Figma) but suffer from memory overhead.
- **Keystroke fan-out is quadratic (O(N²))**. Without server-side grouping or buffering, concurrent writers can saturate server bandwidth and client CPU loops.
- Track document states using **sequential revision numbers** (OT) or **vector clocks** (CRDTs) to identify gaps and safely sync disconnected clients.
Sources & further reading
- Neil Fraser (Google) — Differential Synchronization — the core algorithm behind real-time collaborative text sync
- Yjs Paper — Real-time collaborative editing using CRDTs — deep technical details on Yjs metadata structures and performance optimization
- Figma Engineering — How Figma's Multiplayer Technology Works — case study of tree/graph-based CRDTs and session sync