Platform & API Product Engineering · Lesson 04

Optimistic concurrency & conditional requests

Two API clients read the same resource, each modifies it, and the second write silently erases the first. This is the lost-update problem — one of the most insidious data-corruption bugs in distributed systems — and HTTP has a built-in, elegant mechanism to prevent it.

⏱ 20 min Difficulty: advanced Prereq: REST, idempotency, HTTP basics

By the end you'll be able to

Describe the lost-update problem precisely, including the exact interleaving that causes it and why it is silent.
Implement optimistic concurrency control using ETags and the If-Match / If-None-Match headers, and explain when to return 412 vs. 409.
Design the client-side retry-on-conflict loop and estimate its expected retry count from a conflict probability formula.

The lost-update problem

The lost-update problem occurs whenever two actors perform a read-modify-write cycle on the same resource and the operations interleave. Neither actor acts incorrectly on their own — the bug is in the interleaving. And it is silent: the second writer succeeds, returns 200, and the first writer's changes are gone with no error, no warning, and no trace in the response.

Think of two people editing the same Google Doc without the real-time sync. Person A saves a version with their changes; Person B opened the same original version, made different changes, and saves it five seconds later. Person A's changes vanish. Unlike Google Docs, most REST APIs have no live collaboration layer — the conflict is never visible unless the API is explicitly designed to surface it.

Fig 1 — The lost-update timeline. Both clients read version 1. Client A writes first (version 2). Client B, still operating on its stale read of version 1, writes next — and silently overwrites A's changes. Both writes return 200. No one knows anything went wrong.

⚠️ Why this is silent

The danger of the lost-update problem is not just that it happens — it is that it is completely invisible. Both clients succeed. Both receive 200. The second client has no idea it just destroyed the first client's work. Without concurrency control, the "winner" is always the last writer, and the losers never find out they lost.

Optimistic concurrency control (OCC)

The fix is to carry the version of the resource through the write. A client that read version N must declare "I am updating version N" when it writes. The server performs a compare-and-swap: if the current version is still N, the write succeeds and the server increments to N+1. If the current version is now N+1 or higher (because someone else wrote in between), the server rejects the write with a 412 Precondition Failed.

The word "optimistic" refers to the strategy: you optimistically assume that no conflict will happen when you start the operation. You don't lock anything at read time. You only check for a conflict at write time, and if one occurred, you handle it then. This contrasts with pessimistic locking, which acquires an exclusive lock at read time and holds it until the write completes.

ETags: the HTTP mechanism

HTTP has a native primitive for this: the ETag (Entity Tag). An ETag is an opaque string the server includes in every response for a resource. It represents the current version of that resource. The client stores the ETag from the GET response and sends it back in the If-Match header on the PUT or PATCH. The server compares the incoming ETag against the current version. If they match, the resource has not changed since the client read it — proceed. If they don't match, someone else changed it — reject with 412.

Fig 2 — Optimistic concurrency control via ETag and If-Match. The client reads the resource and stores the ETag. When writing, it sends If-Match: "v3". The server checks the current version — if it still matches, the write proceeds and a new ETag is issued. If a concurrent write changed the version, the server rejects with 412 Precondition Failed.

# Step 1: Read the resource; server returns ETag
GET /orders/42 HTTP/1.1

# Response includes the version identifier
HTTP/1.1 200 OK
ETag: "v3"
Content-Type: application/json

{ "id": "42", "quantity": 5, "status": "pending" }

---
# Step 2: Write with the ETag in If-Match
PUT /orders/42 HTTP/1.1
If-Match: "v3"
Content-Type: application/json

{ "quantity": 10 }

# If no concurrent write: resource at version 3 → proceed
HTTP/1.1 200 OK
ETag: "v4"
{ "id": "42", "quantity": 10 }

# If concurrent write happened: resource now at version 4 → reject
HTTP/1.1 412 Precondition Failed
{ "error": { "type": "precondition_failed", "code": "version_conflict",
             "message": "The resource was modified since you last read it." } }

ETag variants: version number vs. content hash; weak vs. strong

An ETag can be computed two ways, and the difference matters for behaviour.

Version number vs. content hash

	Version number	Content hash
Form	`"v3"`, `"42"`, a monotonic integer or UUID	`"a3f4b2..."` — SHA-256 or MD5 of the serialised response body
Changes when	Any write, even if the result is identical to the previous state	Only when the content actually differs (idempotent writes don't increment)
Computation cost	Trivial — increment a counter or store the version in the row	Requires hashing the full response on every read
Best for	Concurrency control — you want to detect any write, even non-content-changing ones (metadata updates)	Caching — clients can skip re-fetching identical content regardless of how it was produced
Caveat	A version number is opaque; it tells you the resource changed, not how it changed	Two different representations of the same logical state (e.g. JSON field order) can produce different hashes

Weak vs. strong ETags

A strong ETag (the default, no prefix) means byte-for-byte identity. If two representations have the same strong ETag, every byte of their body is identical. The HTTP spec requires strong ETags for range requests and for If-Match conditional writes. A weak ETag has a W/ prefix: W/"v3". It means "semantically equivalent" — the logical content is the same even if the serialisation differs (different whitespace, field order). Weak ETags can only be used with If-None-Match for cache validation, not with If-Match for conditional writes. Always use strong ETags for concurrency control.

# Strong ETag (byte-identical — required for If-Match)
ETag: "a3f4b291c8e2"

# Weak ETag (semantically equivalent — NOT valid for If-Match)
ETag: W/"a3f4b291c8e2"

The conditional request header family

ETags are consumed by a family of conditional HTTP headers. Each header expresses a precondition that must be satisfied for the server to process the request.

Header	Meaning	Primary use
`If-Match: "etag"`	Process the request only if the current ETag matches the provided value.	Conditional write (PUT, PATCH, DELETE) — the core OCC primitive. Returns 412 if the condition fails.
`If-None-Match: "etag"`	Process the request only if the current ETag does not match. With `If-None-Match: *`, means "only if the resource does not exist at all."	(1) Cache validation on GET: returns 304 Not Modified if the client's cached version is still current. (2) Conditional CREATE: `If-None-Match: *` on a PUT makes it a create-only operation — idempotent create.
`If-Unmodified-Since: <date>`	Process the request only if the resource has not been modified since the given timestamp.	Weaker alternative to `If-Match` when the client has a timestamp but no ETag. Less reliable because date granularity is one second and clocks skew.
`If-Modified-Since: <date>`	Process the GET only if the resource has been modified since the given timestamp.	Bandwidth-efficient polling: return 304 if unchanged, avoiding re-transmission of the body.

If-None-Match: * for conditional CREATE

The If-None-Match: * header turns a PUT into a conditional create: "create this resource only if it does not already exist." This is valuable for safe idempotency — a client can retry a create without worrying about creating duplicates:

# Conditional create — only succeeds if /users/ada doesn't exist yet
PUT /users/ada HTTP/1.1
If-None-Match: *
Content-Type: application/json

{ "name": "Ada Lovelace", "email": "ada@math.io" }

# If /users/ada does NOT exist: 201 Created
# If /users/ada already EXISTS: 412 Precondition Failed

409 Conflict vs. 412 Precondition Failed

Both 409 and 412 signal that a write was rejected due to state, but they mean different things and the distinction matters for client handling.

	409 Conflict	412 Precondition Failed
Meaning	The request is in conflict with the current state of the resource. The conflict exists right now, regardless of how the client read the resource.	A conditional header precondition was not satisfied. The client had an older version of the resource when it decided to write.
Cause	Unique key violation, duplicate creation, state machine violation (e.g. cannot cancel an already-delivered order)	OCC ETag mismatch: the resource version changed between the client's last read and their write attempt
Client action	Read the error, understand the constraint, potentially re-read the resource to understand current state before deciding what to do	Re-read the resource to get the new ETag and current state, re-apply your change, retry with the new ETag
Example	`PUT /orders/42/status` attempting to set status to "pending" when it is already "shipped" — the state machine forbids backward transitions	`PUT /orders/42` with `If-Match: "v3"` when the server is at `"v5"` because two other clients wrote in between

Fig 3 — The 412 retry loop. On a conflict, the client re-reads the resource to get the new ETag and current state, re-applies its intended change to that new base (not to the old base), and retries the write. Repeat up to N times. After exhausting retries, surface the conflict to the user or raise an error.

# Retry-on-conflict loop — pseudo-code
function update_with_occ(resource_url, change_fn, max_retries=5):
  for attempt in range(max_retries):
    # Step 1: Read and capture the ETag
    response = GET(resource_url)
    etag     = response.headers['ETag']
    current  = response.body

    # Step 2: Compute the desired new state
    desired = change_fn(current)  # apply your business logic to the CURRENT read

    # Step 3: Attempt the conditional write
    write_response = PUT(resource_url, body=desired,
                         headers={'If-Match': etag})

    if write_response.status == 200:
      return write_response.body  # success

    if write_response.status == 412:
      continue  # conflict: loop will re-read with new ETag

    raise UnexpectedError(write_response.status)

  raise ConflictExhausted("Could not commit after " + max_retries + " attempts")

⚠️ Re-derive the delta from the new base, not from the old base

The most dangerous retry-on-conflict mistake: re-sending the same request body without re-reading first. Imagine you want to increment a counter from 5 to 6. You read 5, compose {"count": 6}, get a 412, and blindly retry {"count": 6}. Meanwhile, another client incremented to 7. You just overwrote 7 with 6 — you decremented the counter. Always pass the change_fn to the retry loop, not the pre-computed new value. The function runs again on the freshly-read state.

OCC vs. pessimistic locking vs. last-write-wins vs. CRDTs

Strategy	How it works	Pros	Cons	Best for
Last-write-wins (no control)	No version checking. Every write succeeds. The last one wins.	Zero implementation complexity	Silent data loss on any concurrent write. Only acceptable when overwrites are intentional and expected.	Idempotent state updates where the "latest" value is always correct (e.g. setting a user's current GPS location)
Pessimistic locking	Client acquires an exclusive lock at read time. Other writers are blocked until the lock is released.	Guarantees no conflicts — no retries needed	Locks are expensive to distribute; a client that crashes while holding a lock blocks everyone else until the TTL expires. Does not compose well over HTTP's stateless request model.	Short-lived, high-contention, low-latency transactions in databases. Rarely appropriate for HTTP APIs.
Optimistic concurrency control (OCC)	No lock at read time. Compare-and-swap at write time using a version/ETag.	Low contention: most operations complete in one attempt. Stateless: no lock state to maintain. Composable over HTTP.	Under high contention (many writers, same resource), retry rates grow and throughput falls. Client must implement retry loop correctly.	Most REST APIs. Especially well-suited for resources that are frequently read but infrequently written, or where contention is low.
CRDTs (Conflict-free Replicated Data Types)	Data structures designed so that any two concurrent operations can always be merged without conflict — e.g., a grow-only counter, a last-write-wins register, a set with add/remove tombstones.	No conflicts possible; suitable for always-available, eventually-consistent systems (no central coordinator needed)	Only works for CRDT-shaped operations. Arbitrary business logic cannot be encoded as a CRDT. Complex to implement and reason about.	Collaborative applications, distributed caches, eventually-consistent shopping carts. Not a general-purpose API concurrency mechanism.

Under the hood: the compare-and-swap mechanism

The ETag check at the database layer is a compare-and-swap (CAS). Understanding how this works prevents a critical implementation mistake: doing the compare-then-write as two separate database operations.

-- WRONG: non-atomic check + update (race condition possible)
-- Between SELECT and UPDATE, another request might change the row
SELECT version FROM orders WHERE id = 42;
-- application checks: is version == 3?
UPDATE orders SET quantity = 10, version = 4 WHERE id = 42;

---

-- CORRECT: atomic compare-and-swap in a single UPDATE
-- The WHERE clause includes the version check: if the row is no longer at
-- version=3, zero rows are affected, and we know there was a conflict.
UPDATE orders
SET    quantity = 10, version = version + 1
WHERE  id = 42
  AND  version = 3;  -- the atomic compare is here, in the WHERE clause

-- rows_affected = 1 → success → return 200 + new ETag
-- rows_affected = 0 → conflict → return 412

This single-statement pattern is safe even under concurrent writes because the database engine guarantees row-level locking for the duration of the UPDATE statement. Two simultaneous updates on the same row will serialise at the database level. One will find version = 3 and succeed (rows_affected = 1). The other will find version = 4 (already incremented) and fail (rows_affected = 0). No explicit transaction needed; no window between check and write.

# Trace a concurrent conflict at the database level # Request A and B both arrived with If-Match: "v3" -- Request A (arrives first at DB) UPDATE orders SET quantity=10, version=4 WHERE id=42 AND version=3; Query OK, 1 row affected → Request A returns 200 OK, ETag: "v4" -- Request B (arrives immediately after) UPDATE orders SET quantity=3, version=4 WHERE id=42 AND version=3; Query OK, 0 rows affected ← version is now 4, not 3 → Request B returns 412 Precondition Failed

By the numbers

How likely is a conflict, and how many retries should you expect? The math depends on the write rate for a given resource and the transaction window — the time between a client reading the resource and submitting the write.

Let r = write rate on a resource (writes per second), w = transaction window (seconds, i.e. the time the client takes to read, compute, and write). The probability that at least one other write lands during the client's window is approximately:

P(conflict) ≈ 1 − e^(−r·w)   [Poisson arrival model]

For small r·w:  P(conflict) ≈ r·w   [first-order approximation]

Expected retries = P(conflict) / (1 − P(conflict))
                 ≈ r·w           [for small P(conflict)]

Worked trace — a popular order resource receiving r = 2 writes/second, client transaction window w = 0.5 s (500 ms round trip + compute):

Step	Value	Notes
Write rate (r)	2 writes/sec (modeled)	This resource is a hot order being processed by multiple workers simultaneously
Transaction window (w)	0.5 s	Time for GET → compute → PUT at 50 ms round trip each plus 400 ms compute
r·w	2 × 0.5 = 1.0	Expected competing writes during the client's window
P(conflict)	1 − e^(−1.0) ≈ 63%	High! Under significant contention this resource will need retries most of the time
Expected retries	0.63 / (1 − 0.63) ≈ 1.7 retries/operation	Average of nearly 2 retries per successful write
With r = 0.1 writes/sec, w = 0.2 s	P(conflict) ≈ 2%	Low contention. OCC is nearly free — most operations complete on the first attempt

The break-even where OCC becomes more expensive than pessimistic locking (due to retry amplification) is roughly at P(conflict) > 20–30%. Below that, OCC's zero-lock-overhead advantage dominates. Above that, consider sharding the hot resource (splitting into per-worker state), or switching to a conflict-free model (CRDT, queue-based serial processing).

The formula also quantifies the benefit of reducing the transaction window. Cutting w from 500 ms to 100 ms at r = 2/s drops P(conflict) from 63% to 18% — a dramatic improvement achievable just by caching the local compute or moving the client closer to the server.

How real platforms do it

Platform	Mechanism	Implementation detail	Source
Amazon S3	Strong ETag on every object, `If-Match` and `If-None-Match` on both GET and PUT. Conditional write support added in 2024: a PUT with `If-None-Match: *` atomically creates an object only if it does not already exist — resolving a long-standing TOCTOU race in distributed uploads.	ETag for a standard upload is the MD5 of the object data (hex-encoded). For multipart uploads it is an MD5 of the concatenated part MD5s followed by `-N` where N is the number of parts.	S3 PutObject — conditional headers
Google APIs (Drive, GCS)	Resources carry an `etag` field in the JSON body as well as the ETag response header. Writes accept `If-Match`. The etag value is an opaque hash, not a sequential version.	GCS additionally supports a `generationMatch` query parameter as an alternative to the HTTP `If-Match` header — the generation number is the object's version, incremented on every overwrite.	GCS request preconditions
etcd	Every key has a `modRevision` (global revision counter). A `Txn` (transaction) with a compare clause is effectively OCC at the key-value store level: the compare block checks the current `modRevision`; if it matches, the success block runs (the write); if not, the failure block runs.	Kubernetes uses etcd's compare-and-swap as the foundation of its controller-manager reconciliation loop and resource-version-based optimistic locking.	etcd API reference — transactions
Stripe	Stripe does not expose ETags for general OCC. Instead, it uses idempotency keys and carefully designed state machines where resources can only transition forward through defined states, making the lost-update problem less dangerous by design.	For event-sourced resources (e.g. PaymentIntents), the current status transition is an atomic state machine check — a confirmed PaymentIntent cannot be re-confirmed. This is pessimistic state-machine guarding rather than client-side OCC.	Stripe PaymentIntent lifecycle

🎯 Interview angle

"How do you prevent two concurrent updates from overwriting each other in a REST API?" A complete answer: (1) name the lost-update problem and show the exact interleaving (two actors, read-modify-write, second write wins silently); (2) describe OCC with ETags and If-Match — the compare-and-swap at the database level using WHERE id=X AND version=N; (3) explain the 412 retry loop and the requirement to re-derive the delta from the fresh read; (4) compare with pessimistic locking and explain when each is appropriate (P(conflict) threshold); (5) mention the If-None-Match: * conditional create for idempotent resource creation. Interviewers at infrastructure companies may ask about CRDTs — give the one-sentence definition and the limitation (only CRDT-shaped operations).

✅ Include the ETag in the JSON body, not just the header

HTTP headers are easy to strip by proxies, API gateways, and developer tools that don't know to preserve them. Including the ETag value as a field in the resource JSON body (e.g. "version": "v3" or "etag": "a3f4b2...") gives clients a reliable fallback and makes the version visible to anyone inspecting the JSON directly. The header and body values should always be identical for a given response.

How to debug & inspect it

OCC bugs fall into two categories: false conflicts (412 when no real conflict exists) and missed conflicts (concurrent writes both succeed when they should have conflicted). Both are diagnosable with curl and careful log inspection.

Trace ETags with curl

$ curl -sv GET https://api.example.com/v1/orders/42 \ -H "Authorization: Bearer sk_test_abc" \ 2>&1 | grep -E "(ETag|etag|HTTP)" < HTTP/1.1 200 OK < ETag: "v7" # Store the ETag and use it in the conditional write $ curl -sv -X PATCH https://api.example.com/v1/orders/42 \ -H "Authorization: Bearer sk_test_abc" \ -H "If-Match: \"v7\"" \ -H "Content-Type: application/json" \ -d '{"quantity": 10}' < HTTP/1.1 200 OK < ETag: "v8" # Simulate a conflict: re-send with the old ETag $ curl -sv -X PATCH https://api.example.com/v1/orders/42 \ -H "If-Match: \"v7\"" \ -d '{"quantity": 3}' < HTTP/1.1 412 Precondition Failed < {"error": {"type": "precondition_failed", "code": "version_conflict", ...

Symptom → cause → fix

Symptom	Likely cause	Fix
Client always gets 412 on first attempt, even without concurrent writers	The server generates a new ETag on every GET even if the data didn't change (e.g. hashing including a timestamp). The client's ETag is already stale by the time it submits.	Generate the ETag from content or version number, not from timestamps or random nonces. A re-read of unchanged data must return the same ETag.
Concurrent writes both succeed — no 412 produced	The WHERE clause version check is missing: the UPDATE runs without `AND version = N`. Rows-affected check is also missing.	Add the version predicate to the UPDATE WHERE clause. Check rows-affected == 1; if 0, return 412.
Client gets 412 and retries, but keeps conflicting even on retry	The retry loop re-uses the original stale ETag instead of re-reading and getting the new ETag first.	The retry loop must GET before every PUT. Never cache the ETag across attempts.
412 returned even when no concurrent writer — single-threaded client	The ETag is being compared case-sensitively but the client is normalising it (e.g. lower-casing). ETags are case-sensitive, including the quotes.	Preserve the exact ETag value including case and the surrounding double-quote characters when echoing back in If-Match.
Weak ETag in response; If-Match is rejected with 400	Server is returning `W/"v3"` (weak ETag) which the HTTP spec forbids for use with `If-Match`. Caches may have injected a `W/` prefix.	Ensure the server generates strong ETags (no `W/` prefix) for resources that support conditional writes. Check no proxy is adding the weak prefix.

🧠 Quick check

1. Two clients both read a resource at version 3. Client A writes successfully, moving it to version 4. Client B then attempts a write with If-Match: "v3". What does the server return?

412 Precondition Failed is the correct response when an If-Match condition fails. Client B sent If-Match: "v3" but the resource is now at version 4 (Client A wrote). The precondition (current version == "v3") is false, so the server rejects the write without modifying the resource.

2. The atomic compare-and-swap at the database level for OCC is best implemented as:

The single UPDATE with the version predicate in the WHERE clause is the correct approach. It is atomic: the check and the write happen in one operation. If rows_affected == 0, no row matched — either the resource doesn't exist or the version changed (conflict). The SELECT-then-UPDATE pattern has a window between the two statements where another writer can interleave.

3. If-None-Match: * on a PUT request means:

If-None-Match: * means "proceed only if there is no current representation of this resource" — i.e., it does not exist yet. It turns a PUT into a conditional create: 201 Created if the resource is new, 412 Precondition Failed if it already exists. This is a powerful idempotent create primitive.

4. A client's retry-on-conflict loop gets a 412 and retries. The correct next step is:

The critical step is re-reading first. The resource may have changed substantially since the original read. Re-applying the original delta to stale data can produce incorrect results. The retry loop must: (1) GET the current state and ETag, (2) re-compute the intended new state by applying the change logic to the fresh current state, (3) PUT with the new ETag in If-Match.

5. OCC becomes less attractive compared to pessimistic locking when:

When P(conflict) is high, OCC forces many retries, each requiring a re-read + recompute + re-write round trip. At high contention (above ~20–30% conflict rate), the retry overhead may exceed the locking overhead of pessimistic approaches. Payload size and HTTP version have no bearing on OCC vs. pessimistic locking — the tradeoff is purely about conflict probability and retry cost.

✍️ Exercise: design the OCC contract for a flight-seat reservation API

A flight seat inventory API allows airline booking agents to reserve individual seats. Multiple agents may be trying to claim the same seat simultaneously. Design the concurrency control contract: what fields go in the resource response, what headers the client sends on a reserve request, what responses the server returns, and how the client handles a conflict.

Model answer:

Resource response includes a version/ETag. The GET response for a seat (GET /flights/AA123/seats/12A) includes both the ETag header (ETag: "v2") and the version in the body: {"seat": "12A", "status": "available", "version": "v2"}. Both are needed: the header for HTTP-aware clients, the body field for any client that reads JSON directly.
Reservation uses If-Match. The reserve request sends PUT /flights/AA123/seats/12A with If-Match: "v2" and body {"status": "reserved", "passenger_id": "pax_42"}. The server performs UPDATE seats SET status='reserved', version='v3' WHERE flight_id='AA123' AND seat='12A' AND version='v2'. rows_affected == 1 → 200 OK + ETag: "v3". rows_affected == 0 → 412 Precondition Failed.
Conflict handling. On 412, the client re-reads the seat: if the status is now "reserved" (by another agent), surface an error to the user: "Seat 12A was just taken — please select another seat." If the status is still "available" (the version changed for another reason, e.g. a metadata update), retry the reservation with the new ETag.
Conditional create for new record insertion. If the seat record needs to be created on first reservation: PUT /flights/AA123/seats/12A with If-None-Match: * ensures only one agent can create it — subsequent attempts get 412.

Rubric: Full marks for all four steps. Must name If-Match and 412 explicitly. Bonus: noting that the retry handler must inspect the new state before deciding whether to auto-retry or escalate to the user.

Key takeaways

The lost-update problem occurs when two clients perform read-modify-write on the same resource and their windows interleave. It is silent: both receive 200 and neither knows the conflict happened.
Optimistic concurrency control fixes this with a compare-and-swap: include a version in the response (as an ETag header + body field), require the client to send it back in If-Match, and return 412 Precondition Failed if the version changed.
The atomic check lives in a single SQL statement with the version predicate in the WHERE clause. rows_affected == 0 means a conflict; no separate SELECT needed.
The retry loop on 412 must re-read the resource before retrying — never re-send the same body. The delta must be re-derived from the freshly-read current state.
If-None-Match: * is the HTTP primitive for conditional create — it prevents duplicate resource creation under concurrent requests.
OCC is low-cost when conflicts are rare (P < 20%). At high contention, reduce the transaction window to lower conflict probability, or reconsider the data model (sharding, CRDTs, queue-based serial writes).