Design Case Studies · Lesson 03
Design: Comment & rating API
Comment and rating systems look simple — they're usually the first feature a team adds — and become surprisingly hard once millions of users can like, reply, and flag the same content simultaneously. This case study shows how denormalized counts, idempotent likes, cursor pagination, and read/write path separation keep the system consistent without serialising every request.
By the end you'll be able to
- Model nested comments as a nested resource without needing recursive API calls.
- Explain why denormalized counts are correct for like/rating counts and when they drift.
- Design an idempotent like/unlike using PUT and DELETE, and explain what breaks if POST is used instead.
Requirements
Functional requirements
- Users can post comments on any post; comments can be replied to (one level of nesting — replies to replies are flat-threaded under the top-level comment).
- Users can like or unlike any comment. Each user can like a given comment at most once.
- Users can submit a numeric rating (1–5 stars) on a post. Each user can rate a post at most once, but can change their rating.
- Comments and ratings can be flagged for moderation.
- Like counts and average ratings are returned with comment and post data.
Non-functional requirements
- Reads are at least 10x more frequent than writes; the read path must be fast even under viral traffic.
- Like counts may be slightly stale (eventually consistent is acceptable; exact real-time counts are not required).
- The like/unlike operation must be idempotent — a client that retries due to a network timeout must not double-count a like.
- Rate limiting must prevent abuse: a user cannot flood a post with automated comments or likes.
Scale assumption
200 000 read requests per second across all comment and rating endpoints (viral post scenario). Write traffic: ~2 000 req/s (comments, likes, ratings). Read:write ratio is ~100:1. Average comment thread: 50 top-level comments, 10 replies each = 550 comments per post, sorted by likes descending. Expect hot posts to receive 10 000–100 000 likes within minutes of going viral.
Design decisions
Nested resources: /posts/:id/comments
Comments are scoped to a post, so the natural URL structure is /v1/posts/:post_id/comments. Replies are a second-level collection under a comment: /v1/posts/:post_id/comments/:comment_id/replies. This models the ownership hierarchy clearly and lets the API enforce that a reply always has a parent comment on the same post — without the client assembling that relationship manually. Deeper nesting (replies to replies) is capped at one level; further replies are posted to the original parent comment, keeping the schema shallow and queries simple. See RESTful maturity levels for when to nest vs. when to flatten.
Cursor pagination for comment lists
Comment lists are sorted by like count descending (for "top" order) or by creation time descending (for "new" order). Both are sortable, non-unique fields — two comments can have the same like count. Cursor-based pagination handles ties correctly by encoding both the sort field and the comment ID as the tie-breaker. Offset pagination would produce duplicate or missing comments if likes change between pages (which they do constantly on viral posts). See RESTful maturity levels for the paginator contract.
Denormalized like and rating counts
The naive approach is to SELECT COUNT(*) FROM likes WHERE comment_id = X on every read. At 200 000 reads/s on a hot post, this is an unacceptable read amplification. The correct approach is to maintain a denormalized counter directly on the comment row: like_count INTEGER NOT NULL DEFAULT 0. When a user likes a comment, an atomic UPDATE comments SET like_count = like_count + 1 WHERE id = X increments the counter. This shifts the cost from reads (which are 100x more frequent) to writes. The downside is that the count may be slightly stale if a write fails mid-transaction — which the non-functional requirements explicitly allow.
For ratings, the post row stores rating_count and rating_sum; the average is computed client-side as rating_sum / rating_count rather than storing the average directly (storing the average makes it non-additive — you can't update it with a single increment).
Idempotent like via PUT/DELETE, not POST
If liking a comment is modelled as POST /v1/posts/:pid/comments/:cid/likes, a client that retries on timeout will create two like records, incrementing the counter twice. The correct model is: a like is a named resource — "user U's like on comment C" — that either exists or does not. Creating it is PUT /v1/posts/:pid/comments/:cid/likes/me (idempotent: if the record already exists, return 200 with no further action). Removing it is DELETE /v1/posts/:pid/comments/:cid/likes/me (idempotent: if the record doesn't exist, return 204 with no error). See idempotency for the full treatment of PUT vs. POST for state transitions.
Rate limiting for abuse control
Without rate limiting, a single user can flood a post with hundreds of comments per minute or script thousands of automated likes. The token bucket is applied at two granularities: per user-IP for unauthenticated traffic (catching bots before authentication), and per authenticated user for write operations (catching authenticated abuse). Write limits: 10 comments per minute per user, 60 likes per minute per user. Read limits: 600 reads per minute per IP. See rate limiting for token bucket mechanics and the 429/Retry-After contract.
Modelling "like" as a POST action rather than a named resource. POST /comments/42/like creates ambiguity: can you like it twice? What's the response if you try? What does "unlike" look like — another POST to a different path? The resource model (PUT .../likes/me / DELETE .../likes/me) makes the semantics unambiguous, the idempotency natural, and the "did I already like this?" query trivial (check if the resource exists).
The API model
===================================================================
COMMENTS — list top-level comments on a post
===================================================================
GET /v1/posts/:post_id/comments?sort=top&cursor={cursor}&limit=20
→ 200 OK
{
"data": [
{
"id": "cmt_x7k2",
"author_id": "usr_881",
"body": "Great writeup — the cursor pagination section saved us last quarter.",
"like_count": 247, // denormalized; may be slightly stale
"reply_count": 12,
"viewer_liked": true, // false if unauthenticated or not liked
"flagged": false,
"created_at": "2026-06-18T09:12:00Z"
}
],
"next_cursor": "eyJsaWtlcyI6MjQ3LCJpZCI6...", // encodes sort key + id tie-breaker
"total_estimate": 548
}
===================================================================
COMMENTS — post a new top-level comment
===================================================================
POST /v1/posts/:post_id/comments
Authorization: Bearer <token>
Content-Type: application/json
{ "body": "Agreed — especially on the latency budget walk-through." }
→ 201 Created
{
"id": "cmt_y9m5",
"author_id": "usr_334",
"body": "Agreed — especially on the latency budget walk-through.",
"like_count": 0,
"reply_count": 0,
"viewer_liked": false,
"created_at": "2026-06-20T14:05:00Z"
}
===================================================================
REPLIES — list replies to a comment
===================================================================
GET /v1/posts/:post_id/comments/:comment_id/replies?cursor={cursor}&limit=10
→ 200 OK (same shape as comment list)
===================================================================
LIKES — like a comment (idempotent)
===================================================================
PUT /v1/posts/:post_id/comments/:comment_id/likes/me
Authorization: Bearer <token>
→ 200 OK (already liked — idempotent, no counter change)
→ 201 Created (first like — counter incremented)
{ "comment_id": "cmt_x7k2", "liked_at": "2026-06-20T14:06:00Z" }
===================================================================
LIKES — unlike a comment (idempotent)
===================================================================
DELETE /v1/posts/:post_id/comments/:comment_id/likes/me
Authorization: Bearer <token>
→ 204 No Content (whether it was liked or not — idempotent)
===================================================================
RATINGS — submit or update a star rating on a post
===================================================================
PUT /v1/posts/:post_id/ratings/me
Authorization: Bearer <token>
Content-Type: application/json
{ "stars": 4 } // 1–5 inclusive; replaces any prior rating
→ 200 OK (updated existing rating)
→ 201 Created (first rating)
{
"stars": 4,
"post_rating_avg": 4.2, // recomputed after this write
"post_rating_count": 1847
}
===================================================================
MODERATION — flag a comment
===================================================================
POST /v1/posts/:post_id/comments/:comment_id/flags
Authorization: Bearer <token>
Content-Type: application/json
{ "reason": "spam" } // spam | harassment | misinformation | other
→ 201 Created
{ "flag_id": "flg_22xp", "status": "pending_review" }
Including "viewer_liked": true/false in the comment object means the client knows immediately whether the current user has liked each comment — without making N additional GET .../likes/me calls. This is a one-time JOIN or Redis lookup per page, not per comment, which is vastly more efficient than per-item API calls. The tradeoff is that unauthenticated reads return false for all items, which is correct.
Evaluation & read vs. write paths
Read path: comment list for a hot post
| Segment | Estimated time | Note |
|---|---|---|
| Client → CDN edge | ~15 ms | Network RTT |
| CDN cache hit | ~1 ms | 10-second TTL on hot post comment lists |
| Cache hit total | ~16 ms | Most viral traffic terminates here |
| CDN miss → read replica query | +20 ms | Index on (post_id, like_count DESC, id) |
| Add viewer_liked join | +5 ms | One lookup per page, not per comment |
| Cache miss total | ~41 ms | Well under any reasonable read SLA |
Write path: like a comment
| Segment | Estimated time | Note |
|---|---|---|
| Auth + rate limit check | ~5 ms | Token check in Redis |
| Idempotency check (unique index lookup) | ~3 ms | Primary key lookup |
| INSERT likes row + UPDATE like_count (atomic) | ~8 ms | Single transaction on primary |
| Response serialise + return | ~2 ms | |
| Write total | ~18 ms | Under any reasonable write SLA |
Count consistency under viral load
When a post goes viral, the write path may process thousands of likes per minute. Each like atomically increments like_count on the primary. The read replica lags by seconds (typical async replication lag); CDN cache serves stale counts for up to 10 seconds. Users may see a count of 1 247 when the true value is 1 251. The non-functional requirements explicitly allowed this. The important guarantee is that the counter never drifts permanently — each write is atomic and the replication always catches up. If exact counts are ever required (e.g., for a prize at exactly 10 000 likes), add an explicit "counter reconciliation" background job rather than tightening consistency across the entire read path.
The most common follow-up question in this domain is: "What if a user likes a comment, then immediately refreshes the page and sees the old count?" Answer: "That's a known consequence of the eventually consistent design we chose. The non-functional requirements allow it. If it were unacceptable, we'd need to either (a) have the like endpoint return the new count and update the client locally without re-fetching, or (b) tighten the CDN TTL — which increases origin load. Choice (a) is nearly always the right answer: the client already has the new state, so don't throw it away."
Under the hood: the core mechanism
Three mechanisms are doing the real work in this design: the adjacency-list schema that stores the comment tree, the two-query + hash-map pattern that fetches it without N+1 queries, and the atomic UPSERT that keeps like counts correct under concurrent writes.
Storing nested comments: adjacency list
-- SCHEMA: adjacency list --
comments (
id TEXT PRIMARY KEY, -- e.g. "cmt_1"
post_id TEXT NOT NULL,
parent_id TEXT REFERENCES comments(id), -- NULL = top-level
author_id TEXT NOT NULL,
body TEXT NOT NULL,
like_count INTEGER NOT NULL DEFAULT 0,
reply_count INTEGER NOT NULL DEFAULT 0,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
)
-- SAMPLE DATA for post "post_abc" --
id parent_id body like_count
─────── ───────── ────────────────────── ──────────
cmt_1 NULL "Great article!" 12
cmt_2 NULL "Disagree on section 3" 5
cmt_3 cmt_1 "Totally agree" 3 ← reply to cmt_1
cmt_4 cmt_1 "Me too!" 1 ← reply to cmt_1
cmt_5 cmt_2 "Can you elaborate?" 0 ← reply to cmt_2
-- FETCH THREAD: two-query + hash-map pattern --
-- (avoids N+1: one query per page, not per comment) --
Query 1: fetch top-level comments (page 1, sorted by likes DESC)
SELECT * FROM comments
WHERE post_id = 'post_abc' AND parent_id IS NULL
ORDER BY like_count DESC, id DESC
LIMIT 20;
→ returns [cmt_1 (12 likes), cmt_2 (5 likes)]
Query 2: fetch all replies for these parent IDs in one shot
SELECT * FROM comments
WHERE parent_id = ANY('{cmt_1, cmt_2}')
ORDER BY created_at ASC;
→ returns [cmt_3, cmt_4, cmt_5]
Application layer: stitch replies to parents in O(n) via hash map
reply_map = {}
for reply in query2_results:
reply_map[reply.parent_id].append(reply)
for comment in query1_results:
comment.replies = reply_map.get(comment.id, [])
Result: 2 DB queries regardless of how many top-level comments there are.
-- LIKE: atomic UPSERT keeps count correct under concurrent writes --
PUT /posts/post_abc/comments/cmt_1/likes/me (user_id = "usr_42")
BEGIN;
-- Try to insert a new like row
INSERT INTO likes (user_id, comment_id, created_at)
VALUES ('usr_42', 'cmt_1', now())
ON CONFLICT (user_id, comment_id) DO NOTHING
RETURNING id; -- returns a row only if inserted (not on conflict)
-- Only increment the counter if a new row was created
IF rows_returned > 0 THEN
UPDATE comments SET like_count = like_count + 1 WHERE id = 'cmt_1';
-- Returns 201 Created
ELSE
-- Conflict: this user already liked this comment
-- Returns 200 OK (idempotent — no counter change)
END IF;
COMMIT;
Why like_count + 1 is safe:
The expression is evaluated atomically inside the transaction.
Two concurrent like requests for cmt_1 both run the INSERT;
exactly one succeeds (unique index enforces it); exactly one UPDATE fires.
No read-modify-write race: we never read like_count before writing it.
-- MATERIALIZED PATH: an alternative for deep trees --
Add a path column: e.g. cmt_3 has path = "/cmt_1/cmt_3/"
Fetch entire subtree of cmt_1: WHERE path LIKE '/cmt_1/%'
Tradeoff: simpler subtree queries, but moving a subtree requires
updating the path column of every descendant — expensive at depth.
Adjacency list is better for flat/one-level nesting (our design).
Materialized path wins for deep, movable trees (file system, org chart).
The naive approach to fetching a comment thread runs one query per comment: fetch the 50 top-level comments, then loop and SELECT * FROM comments WHERE parent_id = ? for each one. That is 51 database queries for a single API response. The two-query + hash-map approach makes it 2 queries regardless of comment count. At 200 000 reads/s, the difference between 2 and 51 queries per request is the difference between a working system and a database meltdown. This is the single most common performance mistake on list endpoints — always batch-fetch related rows.
Operating & debugging it
Most runtime failures in this API fall into two classes: idempotency bugs (the like counter grows incorrectly) and stale-count confusion (the read path shows outdated numbers). Both are diagnosable without touching production data.
| Symptom | Likely cause | Fix |
|---|---|---|
like_count in the API is consistently lower than the actual DB value | Read replica lag — the replica has not yet applied recent write transactions from the primary | This is expected; check replica lag with SELECT now() - pg_last_xact_replay_timestamp(); if lag > 30 s, investigate replication health |
| PUT .../likes/me returns 201 on every call, even for the same user and comment | The unique index on (user_id, comment_id) is missing — every INSERT succeeds and the counter increments each time | Add the index: CREATE UNIQUE INDEX ON likes(user_id, comment_id); then reconcile the inflated counters with a background job |
GET /comments returns top-level comments but viewer_liked is always false for authenticated users | The query that joins the likes table to compute viewer_liked is not receiving the caller's user_id — either the token is not being parsed or the join is missing the user filter | Log the user_id extracted from the token on the server; add LEFT JOIN likes l ON l.comment_id = c.id AND l.user_id = :viewer_id to the list query |
| A reply appears in the top-level comment list instead of under its parent | The list query is missing the WHERE parent_id IS NULL filter | Add AND parent_id IS NULL to the top-level query; replies must be fetched in the second query using WHERE parent_id = ANY(:parent_ids) |
like_count goes negative on high-traffic comments | The unlike path decrements the counter even when the user never liked the comment (e.g., a client calling DELETE without a prior PUT) | In the unlike transaction, only run UPDATE ... SET like_count = like_count - 1 if the DELETE actually removed a row (RETURNING id returns a row); skip the decrement on a no-op DELETE |
| p99 list latency spikes under viral load even with CDN cache in place | Cache TTL has expired and all cache misses are hitting the DB simultaneously — a "thundering herd" on a hot post | Use stale-while-revalidate on the Cache-Control header (already in the lesson's design) so the CDN serves stale while one background revalidation fires, not N simultaneous origin requests |
Debug checklist:
- Verify the unique index on the likes table with
\d likesin psql — if the index is absent, double-counting is inevitable and the counter is already wrong. - Test PUT .../likes/me twice for the same user and comment; the first call must return 201 and the second must return 200 — any other pattern signals a broken idempotency check.
- Compare
like_countfrom the API response against a directSELECT like_count FROM comments— a small difference (1–5) is expected replication lag; a large or growing difference suggests the counter is not being incremented atomically or is being double-counted. - For viewer_liked always false: add a log line printing the
user_idused in the likes JOIN — a common bug is extracting the user ID from the wrong field of the JWT payload. - If the N+1 pattern is suspected, run
EXPLAIN ANALYZEon the comment-list query and count the number of sequential scans — more than 2 total for a single list request confirms the N+1 is present.
🧠 Quick check
1. Why is PUT .../likes/me idempotent but POST .../likes is not?
Idempotency is about the effect of multiple identical calls being the same as one call. PUT to a named resource ("my like on comment 42") either creates it the first time or leaves it unchanged on subsequent calls. POST to a collection creates a new resource each time, which is not idempotent.
2. The post stores rating_sum and rating_count instead of storing the average directly. Why?
An average cannot be updated atomically with a single row increment — you must recalculate it from all the underlying values. Sum and count are each independently incrementable, making them safe to update atomically and easy to reconcile if a write partially fails.
3. A user likes a comment and immediately views the same comment in the app. The like count hasn't changed. What is the most likely cause?
Eventual consistency means the like is durably written, but the cached read lags behind. The fix is either to update the client's local count immediately from the like response (without re-fetching), or to wait for the TTL to expire. A silent database failure would have returned an error response.
4. What does the unique database index on (user_id, comment_id) in the likes table enforce?
The unique index is the hard enforcement layer. Two concurrent like requests from the same user for the same comment will both attempt an INSERT; exactly one will succeed and one will receive a unique constraint violation, which the application layer maps to a 200 "already liked." Without the index, the application-level idempotency check has a race condition.
✍️ Exercise: add a "most controversial" sort order
A product requirement arrives: "Add a 'most controversial' sort that surfaces comments with both high like counts and high flag counts — comments the community is divided on." Design the change: what data does the server need to store, how does the sort key work, and how do you expose it in the API?
Model answer:
-- Data model change --
Add flag_count INTEGER NOT NULL DEFAULT 0 to comments
(same denormalized pattern as like_count)
-- Controversy score (computed, not stored) --
controversy = like_count * flag_count / (like_count + flag_count + 1)
This peaks when likes ≈ flags; near-zero when either is dominant.
Store as a generated/persisted column if sort frequency warrants it.
-- API change --
GET /v1/posts/:post_id/comments?sort=controversial&cursor=...
Cursor encodes (controversy_score DESC, comment_id) — same tie-breaker pattern
-- Response addition --
{ "flag_count": 38 } // only expose when viewer is a moderator?
// Consider hiding flag_count from general users
// to avoid "report-bombing" as a visibility hack
Rubric: ✓ denormalized flag_count column (same pattern as likes — don't join flag table on every read) ✓ controversy score formula that peaks when votes are split ✓ considered whether score should be a persisted generated column vs. computed at query time ✓ cursor encodes the new sort key ✓ raised the visibility concern about exposing flag_count to general users (potential for gaming). All five = strong answer.
Key takeaways
- Model nested resources hierarchically (
/posts/:id/comments/:id/replies) to encode ownership without extra query complexity, but cap nesting depth to keep schemas simple. - Denormalize counts onto the parent row: the read path is 100x more frequent than the write path, so pay at write time, not read time.
- Make "like" a named resource and use PUT/DELETE rather than POST actions — idempotency follows naturally and the semantics become unambiguous.
- Enforce the one-like-per-user constraint with a unique database index, not just application logic; concurrent requests require a hard constraint to be race-free.
- For eventual count consistency: update the client's local state from the write response rather than refetching — eliminates the stale-read problem without tightening the consistency model.
Sources & further reading
- Use The Index, Luke — No offset (the performance case against offset pagination on sorted result sets)
- PostgreSQL — Unique Indexes (how the database enforces constraints concurrently)
- Stripe — Idempotent Requests (production pattern for idempotency keys)
- Meta Graph API — Error Handling (rate limiting and moderation error patterns at scale)
- API Design course — Idempotency
- API Design course — Rate limiting
- API Design course — Caching strategies
- RFC 9110 §9.3.4 — PUT method semantics (idempotency by specification)