Design Case Studies · Lesson 03

Design: Comment & rating API

Comment and rating systems look simple — they're usually the first feature a team adds — and become surprisingly hard once millions of users can like, reply, and flag the same content simultaneously. This case study shows how denormalized counts, idempotent likes, cursor pagination, and read/write path separation keep the system consistent without serialising every request.

⏱ 22 min Difficulty: advanced Prereq: Case-study framework, Idempotency, Rate limiting

By the end you'll be able to

Model nested comments as a nested resource without needing recursive API calls.
Explain why denormalized counts are correct for like/rating counts and when they drift.
Design an idempotent like/unlike using PUT and DELETE, and explain what breaks if POST is used instead.

Requirements

Functional requirements

Users can post comments on any post; comments can be replied to (one level of nesting — replies to replies are flat-threaded under the top-level comment).
Users can like or unlike any comment. Each user can like a given comment at most once.
Users can submit a numeric rating (1–5 stars) on a post. Each user can rate a post at most once, but can change their rating.
Comments and ratings can be flagged for moderation.
Like counts and average ratings are returned with comment and post data.

Non-functional requirements

Reads are at least 10x more frequent than writes; the read path must be fast even under viral traffic.
Like counts may be slightly stale (eventually consistent is acceptable; exact real-time counts are not required).
The like/unlike operation must be idempotent — a client that retries due to a network timeout must not double-count a like.
Rate limiting must prevent abuse: a user cannot flood a post with automated comments or likes.

Scale assumption

200 000 read requests per second across all comment and rating endpoints (viral post scenario). Write traffic: ~2 000 req/s (comments, likes, ratings). Read:write ratio is ~100:1. Average comment thread: 50 top-level comments, 10 replies each = 550 comments per post, sorted by likes descending. Expect hot posts to receive 10 000–100 000 likes within minutes of going viral.

Design decisions

Nested resources: /posts/:id/comments

Comments are scoped to a post, so the natural URL structure is /v1/posts/:post_id/comments. Replies are a second-level collection under a comment: /v1/posts/:post_id/comments/:comment_id/replies. This models the ownership hierarchy clearly and lets the API enforce that a reply always has a parent comment on the same post — without the client assembling that relationship manually. Deeper nesting (replies to replies) is capped at one level; further replies are posted to the original parent comment, keeping the schema shallow and queries simple. See RESTful maturity levels for when to nest vs. when to flatten.

Cursor pagination for comment lists

Comment lists are sorted by like count descending (for "top" order) or by creation time descending (for "new" order). Both are sortable, non-unique fields — two comments can have the same like count. Cursor-based pagination handles ties correctly by encoding both the sort field and the comment ID as the tie-breaker. Offset pagination would produce duplicate or missing comments if likes change between pages (which they do constantly on viral posts). See RESTful maturity levels for the paginator contract.

Denormalized like and rating counts

The naive approach is to SELECT COUNT(*) FROM likes WHERE comment_id = X on every read. At 200 000 reads/s on a hot post, this is an unacceptable read amplification. The correct approach is to maintain a denormalized counter directly on the comment row: like_count INTEGER NOT NULL DEFAULT 0. When a user likes a comment, an atomic UPDATE comments SET like_count = like_count + 1 WHERE id = X increments the counter. This shifts the cost from reads (which are 100x more frequent) to writes. The downside is that the count may be slightly stale if a write fails mid-transaction — which the non-functional requirements explicitly allow.

For ratings, the post row stores rating_count and rating_sum; the average is computed client-side as rating_sum / rating_count rather than storing the average directly (storing the average makes it non-additive — you can't update it with a single increment).

Idempotent like via PUT/DELETE, not POST

If liking a comment is modelled as POST /v1/posts/:pid/comments/:cid/likes, a client that retries on timeout will create two like records, incrementing the counter twice. The correct model is: a like is a named resource — "user U's like on comment C" — that either exists or does not. Creating it is PUT /v1/posts/:pid/comments/:cid/likes/me (idempotent: if the record already exists, return 200 with no further action). Removing it is DELETE /v1/posts/:pid/comments/:cid/likes/me (idempotent: if the record doesn't exist, return 204 with no error). See idempotency for the full treatment of PUT vs. POST for state transitions.

Rate limiting for abuse control

Without rate limiting, a single user can flood a post with hundreds of comments per minute or script thousands of automated likes. The token bucket is applied at two granularities: per user-IP for unauthenticated traffic (catching bots before authentication), and per authenticated user for write operations (catching authenticated abuse). Write limits: 10 comments per minute per user, 60 likes per minute per user. Read limits: 600 reads per minute per IP. See rate limiting for token bucket mechanics and the 429/Retry-After contract.

⚠️ Common trap

Modelling "like" as a POST action rather than a named resource. POST /comments/42/like creates ambiguity: can you like it twice? What's the response if you try? What does "unlike" look like — another POST to a different path? The resource model (PUT .../likes/me / DELETE .../likes/me) makes the semantics unambiguous, the idempotency natural, and the "did I already like this?" query trivial (check if the resource exists).

The API model

===================================================================
 COMMENTS — list top-level comments on a post
===================================================================
GET /v1/posts/:post_id/comments?sort=top&cursor={cursor}&limit=20

→ 200 OK
{
  "data": [
    {
      "id": "cmt_x7k2",
      "author_id": "usr_881",
      "body": "Great writeup — the cursor pagination section saved us last quarter.",
      "like_count": 247,          // denormalized; may be slightly stale
      "reply_count": 12,
      "viewer_liked": true,       // false if unauthenticated or not liked
      "flagged": false,
      "created_at": "2026-06-18T09:12:00Z"
    }
  ],
  "next_cursor": "eyJsaWtlcyI6MjQ3LCJpZCI6...",  // encodes sort key + id tie-breaker
  "total_estimate": 548
}

===================================================================
 COMMENTS — post a new top-level comment
===================================================================
POST /v1/posts/:post_id/comments
Authorization: Bearer <token>
Content-Type: application/json

{ "body": "Agreed — especially on the latency budget walk-through." }

→ 201 Created
{
  "id": "cmt_y9m5",
  "author_id": "usr_334",
  "body": "Agreed — especially on the latency budget walk-through.",
  "like_count": 0,
  "reply_count": 0,
  "viewer_liked": false,
  "created_at": "2026-06-20T14:05:00Z"
}

===================================================================
 REPLIES — list replies to a comment
===================================================================
GET /v1/posts/:post_id/comments/:comment_id/replies?cursor={cursor}&limit=10

→ 200 OK  (same shape as comment list)

===================================================================
 LIKES — like a comment (idempotent)
===================================================================
PUT /v1/posts/:post_id/comments/:comment_id/likes/me
Authorization: Bearer <token>

→ 200 OK  (already liked — idempotent, no counter change)
→ 201 Created  (first like — counter incremented)
{ "comment_id": "cmt_x7k2", "liked_at": "2026-06-20T14:06:00Z" }

===================================================================
 LIKES — unlike a comment (idempotent)
===================================================================
DELETE /v1/posts/:post_id/comments/:comment_id/likes/me
Authorization: Bearer <token>

→ 204 No Content  (whether it was liked or not — idempotent)

===================================================================
 RATINGS — submit or update a star rating on a post
===================================================================
PUT /v1/posts/:post_id/ratings/me
Authorization: Bearer <token>
Content-Type: application/json

{ "stars": 4 }   // 1–5 inclusive; replaces any prior rating

→ 200 OK  (updated existing rating)
→ 201 Created  (first rating)
{
  "stars": 4,
  "post_rating_avg": 4.2,      // recomputed after this write
  "post_rating_count": 1847
}

===================================================================
 MODERATION — flag a comment
===================================================================
POST /v1/posts/:post_id/comments/:comment_id/flags
Authorization: Bearer <token>
Content-Type: application/json

{ "reason": "spam" }  // spam | harassment | misinformation | other

→ 201 Created
{ "flag_id": "flg_22xp", "status": "pending_review" }

✅ Return viewer_liked in the comment list response

Including "viewer_liked": true/false in the comment object means the client knows immediately whether the current user has liked each comment — without making N additional GET .../likes/me calls. This is a one-time JOIN or Redis lookup per page, not per comment, which is vastly more efficient than per-item API calls. The tradeoff is that unauthenticated reads return false for all items, which is correct.

Read and write paths are separated. Reads hit a CDN cache and read replica; writes go to the primary with atomic counter updates that replicate asynchronously to the read path.

Evaluation & read vs. write paths

Read path: comment list for a hot post

Segment	Estimated time	Note
Client → CDN edge	~15 ms	Network RTT
CDN cache hit	~1 ms	10-second TTL on hot post comment lists
Cache hit total	~16 ms	Most viral traffic terminates here
CDN miss → read replica query	+20 ms	Index on (post_id, like_count DESC, id)
Add viewer_liked join	+5 ms	One lookup per page, not per comment
Cache miss total	~41 ms	Well under any reasonable read SLA

Write path: like a comment

Segment	Estimated time	Note
Auth + rate limit check	~5 ms	Token check in Redis
Idempotency check (unique index lookup)	~3 ms	Primary key lookup
INSERT likes row + UPDATE like_count (atomic)	~8 ms	Single transaction on primary
Response serialise + return	~2 ms
Write total	~18 ms	Under any reasonable write SLA

Count consistency under viral load

When a post goes viral, the write path may process thousands of likes per minute. Each like atomically increments like_count on the primary. The read replica lags by seconds (typical async replication lag); CDN cache serves stale counts for up to 10 seconds. Users may see a count of 1 247 when the true value is 1 251. The non-functional requirements explicitly allowed this. The important guarantee is that the counter never drifts permanently — each write is atomic and the replication always catches up. If exact counts are ever required (e.g., for a prize at exactly 10 000 likes), add an explicit "counter reconciliation" background job rather than tightening consistency across the entire read path.

Count consistency timeline. The primary is immediately correct; the replica catches up within seconds; the CDN serves stale counts for up to the TTL. All acceptable per the non-functional requirements.

🎯 Interview angle

The most common follow-up question in this domain is: "What if a user likes a comment, then immediately refreshes the page and sees the old count?" Answer: "That's a known consequence of the eventually consistent design we chose. The non-functional requirements allow it. If it were unacceptable, we'd need to either (a) have the like endpoint return the new count and update the client locally without re-fetching, or (b) tighten the CDN TTL — which increases origin load. Choice (a) is nearly always the right answer: the client already has the new state, so don't throw it away."

Under the hood: the core mechanism

Three mechanisms are doing the real work in this design: the adjacency-list schema that stores the comment tree, the two-query + hash-map pattern that fetches it without N+1 queries, and the atomic UPSERT that keeps like counts correct under concurrent writes.

Storing nested comments: adjacency list

-- SCHEMA: adjacency list --
comments (
  id          TEXT PRIMARY KEY,          -- e.g. "cmt_1"
  post_id     TEXT NOT NULL,
  parent_id   TEXT REFERENCES comments(id),  -- NULL = top-level
  author_id   TEXT NOT NULL,
  body        TEXT NOT NULL,
  like_count  INTEGER NOT NULL DEFAULT 0,
  reply_count INTEGER NOT NULL DEFAULT 0,
  created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
)

-- SAMPLE DATA for post "post_abc" --
 id      parent_id  body                    like_count
───────  ─────────  ──────────────────────  ──────────
cmt_1   NULL       "Great article!"        12
cmt_2   NULL       "Disagree on section 3" 5
cmt_3   cmt_1      "Totally agree"         3   ← reply to cmt_1
cmt_4   cmt_1      "Me too!"               1   ← reply to cmt_1
cmt_5   cmt_2      "Can you elaborate?"    0   ← reply to cmt_2

-- FETCH THREAD: two-query + hash-map pattern --
-- (avoids N+1: one query per page, not per comment) --

Query 1: fetch top-level comments (page 1, sorted by likes DESC)
SELECT * FROM comments
  WHERE post_id = 'post_abc' AND parent_id IS NULL
  ORDER BY like_count DESC, id DESC
  LIMIT 20;
  → returns [cmt_1 (12 likes), cmt_2 (5 likes)]

Query 2: fetch all replies for these parent IDs in one shot
SELECT * FROM comments
  WHERE parent_id = ANY('{cmt_1, cmt_2}')
  ORDER BY created_at ASC;
  → returns [cmt_3, cmt_4, cmt_5]

Application layer: stitch replies to parents in O(n) via hash map
reply_map = {}
for reply in query2_results:
    reply_map[reply.parent_id].append(reply)
for comment in query1_results:
    comment.replies = reply_map.get(comment.id, [])
Result: 2 DB queries regardless of how many top-level comments there are.

-- LIKE: atomic UPSERT keeps count correct under concurrent writes --

PUT /posts/post_abc/comments/cmt_1/likes/me  (user_id = "usr_42")

BEGIN;
  -- Try to insert a new like row
  INSERT INTO likes (user_id, comment_id, created_at)
    VALUES ('usr_42', 'cmt_1', now())
    ON CONFLICT (user_id, comment_id) DO NOTHING
    RETURNING id;   -- returns a row only if inserted (not on conflict)

  -- Only increment the counter if a new row was created
  IF rows_returned > 0 THEN
    UPDATE comments SET like_count = like_count + 1 WHERE id = 'cmt_1';
    -- Returns 201 Created
  ELSE
    -- Conflict: this user already liked this comment
    -- Returns 200 OK (idempotent — no counter change)
  END IF;
COMMIT;

Why like_count + 1 is safe:
  The expression is evaluated atomically inside the transaction.
  Two concurrent like requests for cmt_1 both run the INSERT;
  exactly one succeeds (unique index enforces it); exactly one UPDATE fires.
  No read-modify-write race: we never read like_count before writing it.

-- MATERIALIZED PATH: an alternative for deep trees --
Add a path column: e.g. cmt_3 has path = "/cmt_1/cmt_3/"
Fetch entire subtree of cmt_1:  WHERE path LIKE '/cmt_1/%'
Tradeoff: simpler subtree queries, but moving a subtree requires
  updating the path column of every descendant — expensive at depth.
Adjacency list is better for flat/one-level nesting (our design).
Materialized path wins for deep, movable trees (file system, org chart).

The adjacency list tree (left) is fetched with two SQL queries. The application layer stitches replies to their parents using a hash map keyed by parent_id — O(n) work, 2 round trips regardless of comment count.

⚠️ The N+1 query trap on comment lists

The naive approach to fetching a comment thread runs one query per comment: fetch the 50 top-level comments, then loop and SELECT * FROM comments WHERE parent_id = ? for each one. That is 51 database queries for a single API response. The two-query + hash-map approach makes it 2 queries regardless of comment count. At 200 000 reads/s, the difference between 2 and 51 queries per request is the difference between a working system and a database meltdown. This is the single most common performance mistake on list endpoints — always batch-fetch related rows.

Operating & debugging it

Most runtime failures in this API fall into two classes: idempotency bugs (the like counter grows incorrectly) and stale-count confusion (the read path shows outdated numbers). Both are diagnosable without touching production data.

# Test idempotency: PUT /likes/me twice and confirm 201 then 200 $ curl -si -X PUT -H "Authorization: Bearer <token>" \ https://api.example.com/v1/posts/post_abc/comments/cmt_1/likes/me HTTP/2 201 {"comment_id":"cmt_1","liked_at":"2026-06-20T14:06:00Z"} $ curl -si -X PUT -H "Authorization: Bearer <token>" \ https://api.example.com/v1/posts/post_abc/comments/cmt_1/likes/me HTTP/2 200 {"comment_id":"cmt_1","liked_at":"2026-06-20T14:06:00Z"} # 201 first time (new like), 200 second time (idempotent) — counter incremented once only # Compare API like_count with actual DB count (check for replication lag) $ curl -s -H "Authorization: Bearer <token>" \ https://api.example.com/v1/posts/post_abc/comments/cmt_1 | jq .like_count 247 $ psql -c "SELECT like_count FROM comments WHERE id = 'cmt_1';" like_count ----------- 249 # API shows 247, DB has 249 — 2 new likes not yet replicated to read replica # This is expected behaviour under eventual consistency (non-functional requirement) # Verify the unique index exists (prevents double-counting race) $ psql -c "\d likes" Indexes: "likes_user_id_comment_id_key" UNIQUE, btree (user_id, comment_id) # If this index is missing, concurrent likes will double-count

Symptom	Likely cause	Fix
`like_count` in the API is consistently lower than the actual DB value	Read replica lag — the replica has not yet applied recent write transactions from the primary	This is expected; check replica lag with `SELECT now() - pg_last_xact_replay_timestamp()`; if lag > 30 s, investigate replication health
PUT .../likes/me returns 201 on every call, even for the same user and comment	The unique index on `(user_id, comment_id)` is missing — every INSERT succeeds and the counter increments each time	Add the index: `CREATE UNIQUE INDEX ON likes(user_id, comment_id);` then reconcile the inflated counters with a background job
GET /comments returns top-level comments but `viewer_liked` is always `false` for authenticated users	The query that joins the likes table to compute `viewer_liked` is not receiving the caller's `user_id` — either the token is not being parsed or the join is missing the user filter	Log the `user_id` extracted from the token on the server; add `LEFT JOIN likes l ON l.comment_id = c.id AND l.user_id = :viewer_id` to the list query
A reply appears in the top-level comment list instead of under its parent	The list query is missing the `WHERE parent_id IS NULL` filter	Add `AND parent_id IS NULL` to the top-level query; replies must be fetched in the second query using `WHERE parent_id = ANY(:parent_ids)`
`like_count` goes negative on high-traffic comments	The unlike path decrements the counter even when the user never liked the comment (e.g., a client calling DELETE without a prior PUT)	In the unlike transaction, only run `UPDATE ... SET like_count = like_count - 1` if the DELETE actually removed a row (`RETURNING id` returns a row); skip the decrement on a no-op DELETE
p99 list latency spikes under viral load even with CDN cache in place	Cache TTL has expired and all cache misses are hitting the DB simultaneously — a "thundering herd" on a hot post	Use `stale-while-revalidate` on the Cache-Control header (already in the lesson's design) so the CDN serves stale while one background revalidation fires, not N simultaneous origin requests

Debug checklist:

Verify the unique index on the likes table with \d likes in psql — if the index is absent, double-counting is inevitable and the counter is already wrong.
Test PUT .../likes/me twice for the same user and comment; the first call must return 201 and the second must return 200 — any other pattern signals a broken idempotency check.
Compare like_count from the API response against a direct SELECT like_count FROM comments — a small difference (1–5) is expected replication lag; a large or growing difference suggests the counter is not being incremented atomically or is being double-counted.
For viewer_liked always false: add a log line printing the user_id used in the likes JOIN — a common bug is extracting the user ID from the wrong field of the JWT payload.
If the N+1 pattern is suspected, run EXPLAIN ANALYZE on the comment-list query and count the number of sequential scans — more than 2 total for a single list request confirms the N+1 is present.

🧠 Quick check

1. Why is PUT .../likes/me idempotent but POST .../likes is not?

Idempotency is about the effect of multiple identical calls being the same as one call. PUT to a named resource ("my like on comment 42") either creates it the first time or leaves it unchanged on subsequent calls. POST to a collection creates a new resource each time, which is not idempotent.

2. The post stores rating_sum and rating_count instead of storing the average directly. Why?

An average cannot be updated atomically with a single row increment — you must recalculate it from all the underlying values. Sum and count are each independently incrementable, making them safe to update atomically and easy to reconcile if a write partially fails.

3. A user likes a comment and immediately views the same comment in the app. The like count hasn't changed. What is the most likely cause?

Eventual consistency means the like is durably written, but the cached read lags behind. The fix is either to update the client's local count immediately from the like response (without re-fetching), or to wait for the TTL to expire. A silent database failure would have returned an error response.

4. What does the unique database index on (user_id, comment_id) in the likes table enforce?

The unique index is the hard enforcement layer. Two concurrent like requests from the same user for the same comment will both attempt an INSERT; exactly one will succeed and one will receive a unique constraint violation, which the application layer maps to a 200 "already liked." Without the index, the application-level idempotency check has a race condition.

✍️ Exercise: add a "most controversial" sort order

A product requirement arrives: "Add a 'most controversial' sort that surfaces comments with both high like counts and high flag counts — comments the community is divided on." Design the change: what data does the server need to store, how does the sort key work, and how do you expose it in the API?

Model answer:

-- Data model change --
Add flag_count INTEGER NOT NULL DEFAULT 0 to comments
(same denormalized pattern as like_count)

-- Controversy score (computed, not stored) --
controversy = like_count * flag_count / (like_count + flag_count + 1)
This peaks when likes ≈ flags; near-zero when either is dominant.
Store as a generated/persisted column if sort frequency warrants it.

-- API change --
GET /v1/posts/:post_id/comments?sort=controversial&cursor=...
Cursor encodes (controversy_score DESC, comment_id) — same tie-breaker pattern

-- Response addition --
{ "flag_count": 38 }  // only expose when viewer is a moderator?
                    // Consider hiding flag_count from general users
                    // to avoid "report-bombing" as a visibility hack

Rubric: ✓ denormalized flag_count column (same pattern as likes — don't join flag table on every read) ✓ controversy score formula that peaks when votes are split ✓ considered whether score should be a persisted generated column vs. computed at query time ✓ cursor encodes the new sort key ✓ raised the visibility concern about exposing flag_count to general users (potential for gaming). All five = strong answer.

Key takeaways

Model nested resources hierarchically (/posts/:id/comments/:id/replies) to encode ownership without extra query complexity, but cap nesting depth to keep schemas simple.
Denormalize counts onto the parent row: the read path is 100x more frequent than the write path, so pay at write time, not read time.
Make "like" a named resource and use PUT/DELETE rather than POST actions — idempotency follows naturally and the semantics become unambiguous.
Enforce the one-like-per-user constraint with a unique database index, not just application logic; concurrent requests require a hard constraint to be race-free.
For eventual count consistency: update the client's local state from the write response rather than refetching — eliminates the stale-read problem without tightening the consistency model.

Sources & further reading

Use The Index, Luke — No offset (the performance case against offset pagination on sorted result sets)
PostgreSQL — Unique Indexes (how the database enforces constraints concurrently)
Stripe — Idempotent Requests (production pattern for idempotency keys)
Meta Graph API — Error Handling (rate limiting and moderation error patterns at scale)
API Design course — Idempotency
API Design course — Rate limiting
API Design course — Caching strategies
RFC 9110 §9.3.4 — PUT method semantics (idempotency by specification)