Performance · Lesson 04
Data fetching & pagination
Most API performance problems are data problems in disguise: unbounded result sets, repeated per-row queries, and brute-force offsets that scan millions of rows. This lesson is a practical guide to fetching exactly the data you need, in the right shape, at the right time — without collapsing the database under the weight of your own API.
By the end you'll be able to
- Choose between offset and cursor (keyset) pagination and explain why cursors scale better under inserts and large offsets.
- Recognise and fix the N+1 query problem using batching or eager loading.
- Apply prefetching, debouncing, and partial responses to reduce wasted data transfer and unnecessary API calls.
Why pagination is a performance concern, not just a UX nicety
An unpaginated list endpoint is a time bomb. It might return 10 rows today and 10 million rows in two years. A client that calls GET /v1/orders without a limit expects a quick response; the database disagrees. Without explicit limits, a single API call can exhaust server memory, saturate the network link, crash the client parser, and deny service to all other users sharing the same connection pool.
Pagination is how you put a contractual cap on how much work one request can demand. The question is not whether to paginate — it's which pagination strategy scales to your access patterns.
Offset/limit pagination: the familiar approach and its limits
Offset pagination works like a book index: "give me 20 items starting from position 200."
# Request
GET /v1/orders?offset=200&limit=20
# Response
{
"data": [ ... 20 orders ... ],
"total": 8421,
"offset": 200,
"limit": 20
}
The SQL behind it: SELECT * FROM orders ORDER BY id LIMIT 20 OFFSET 200. This is fine for page 1. For page 10,000 (OFFSET 200000), the database must scan and discard 200,000 rows before returning 20. At offset 1 million, that scan can take seconds. Deep offset pagination does not scale.
A second problem: instability under inserts. If a new order is inserted at position 150 while the client is paginating, every record after position 150 shifts down by one. The client will see a record twice (a duplicate) and skip a different one — an invisible data integrity error that is nearly impossible to debug in production.
Cursor (keyset) pagination: stable and scalable
Cursor pagination sidesteps both problems by anchoring on a value in the data itself rather than a numeric position. Instead of "give me rows 200–220," you say "give me 20 rows that come after the order with ID ord_7892."
The SQL: SELECT * FROM orders WHERE id > 7892 ORDER BY id LIMIT 20. This is an index range scan — the database goes straight to row 7892 via the index and reads forward. It is O(page size), not O(offset). It stays fast at any depth.
Stability comes from the same property: the anchor is a data value, not a position. New inserts do not shift the cursor. The client always resumes exactly where it left off.
# First page request
GET /v1/orders?limit=20
# First page response — server encodes the cursor opaquely
{
"data": [ ... 20 orders ... ],
"next_cursor": "eyJpZCI6Nzg5Mn0", // base64({"id":7892})
"has_more": true
}
# Second page request — client passes cursor back
GET /v1/orders?after=eyJpZCI6Nzg5Mn0&limit=20
# Second page response
{
"data": [ ... next 20 orders ... ],
"next_cursor": "eyJpZCI6NzkxMn0",
"has_more": true
}
The cursor is opaque to the client: it is a base64-encoded blob (or any encoding) of the sort key values. The client should not decode or construct it — this lets the server change the cursor's internal structure without a breaking change. The client just echoes it back.
Diagram: offset vs cursor pagination
Page size limits and enforcement
Never allow clients to request unlimited result sets. Always enforce a server-side maximum page size. The right number depends on your row size and use case, but a typical range is 20–100 items for UI pagination and 100–500 for data export APIs. For large exports, prefer asynchronous jobs over synchronous large responses.
# Good: cap enforced on server
GET /v1/events?limit=10000
# Server ignores 10000, applies max: returns 100 items + next_cursor
# Pitfall: no cap at all
GET /v1/events
# Returns all 4.2 million events. Kills the server, crashes the client.
# Good response shape signals remaining pages
{
"data": [...],
"next_cursor": "eyJpZCI6MTAwfQ", // present = more pages exist
"has_more": true
}
# When next_cursor is absent or has_more is false: client is done.
The N+1 query problem
N+1 is the silent killer of data-layer performance. It appears when code fetches a list, then makes one additional query per item in the list to fetch a related resource. The name comes from 1 (list query) + N (per-item queries).
# N+1 anti-pattern: 1 query to get orders, then 1 per order for customer name
orders = db.query("SELECT * FROM orders LIMIT 20") # 1 query
for order in orders:
order.customer = db.query( # 20 more queries
"SELECT name FROM customers WHERE id = ?", order.customer_id
)
# Result: 21 queries for 20 rows. For 100 rows: 101 queries.
# Fix 1: JOIN (fetch everything in one query)
db.query("""
SELECT o.*, c.name AS customer_name
FROM orders o
JOIN customers c ON c.id = o.customer_id
LIMIT 20
""") # 1 query
# Fix 2: batch lookup (collect all customer IDs, one IN query)
orders = db.query("SELECT * FROM orders LIMIT 20")
ids = [o.customer_id for o in orders]
customers = db.query(
"SELECT * FROM customers WHERE id = ANY(?)", ids # 1 query
)
customer_map = {c.id: c for c in customers}
for order in orders:
order.customer = customer_map[order.customer_id]
# Result: 2 queries regardless of page size.
GraphQL N+1 is a common variant: a resolver for a field on each list item fires independently for each row. The standard fix is the DataLoader pattern — collect all field requests for a tick, then batch them into one query. See Lesson as-04 for the GraphQL-specific approach.
Prefetching
If you can predict what data the client will need next, fetch it before they ask. The classic pattern is prefetching the next page while the user views the current one:
# Client renders page 1. In the background, fires:
GET /v1/feed?after=eyJpZCI6MjB9&limit=20
# When user hits "Load more", response is already cached in memory.
# Perceived latency: ~0 ms. Actual API call happened seconds earlier.
Prefetching works when: the next page is predictable (linear feed, sequential results), the data is not too stale by the time it is shown, and the cost of the wasted prefetch (user never scrolls) is acceptable. For infinite-scroll feeds, prefetch 1 page ahead; for paginated reports, prefetch 2–3.
Debouncing client input
Search-as-you-type, filter sliders, and autocomplete inputs can fire dozens of API calls per second if unconstrained. Each keystroke triggers a request; most responses arrive after the next keystroke has already changed the query, making them wasted work.
Debouncing delays the API call until the input has been stable for a short period (typically 200–400 ms). A user typing "coffee" at normal speed fires one API call when they pause, not six.
# Naive: fires on every keystroke
input.addEventListener('input', () => fetchSuggestions(input.value))
# Debounced: waits 300 ms of silence before calling
let timer;
input.addEventListener('input', () => {
clearTimeout(timer);
timer = setTimeout(() => fetchSuggestions(input.value), 300);
});
Also cancel in-flight requests when a new one supersedes them (use AbortController in browsers). Otherwise stale responses arriving out of order will overwrite the latest results with outdated data.
Partial responses and field selection
A list endpoint that returns 50 fields per item but the client only needs 4 is transmitting 12× more data than necessary. Field selection lets clients request only the fields they need:
# Without field selection: returns all 50 fields per user
GET /v1/users?limit=50
# With field selection: returns 4 fields per user
GET /v1/users?limit=50&fields=id,name,avatar_url,last_active
# Response (partial)
{
"data": [
{ "id": 1, "name": "Mia", "avatar_url": "...", "last_active": "2026-06-18" },
...
],
"next_cursor": "eyJpZCI6NTB9"
}
This is especially valuable on list endpoints. A user list with fields=id,name,avatar_url might return 2 KB vs 100 KB for the full schema — a 50× payload reduction that speeds up both transfer and client-side parsing.
GraphQL makes field selection the default contract: clients declare exactly which fields they want in every query, so the server never over-fetches by design.
Offset pagination with large offsets is one of the most common database killers in production. A query like SELECT * FROM events ORDER BY created_at LIMIT 20 OFFSET 5000000 must skip 5 million rows. On a table with 10 million rows, this can take 30+ seconds and hold a table lock that blocks writes. If your API supports "jump to page N" via an offset, set a hard server-side maximum offset (e.g., 10,000 rows) and require cursor pagination for deeper access. Many APIs (Stripe, GitHub, Shopify) have completely removed offset pagination from their modern APIs for this reason.
A classic system design question: "Design a paginated API for a user's message history — millions of messages per user." The winning answer covers: (1) cursor pagination keyed on (user_id, message_id) so the index range scan is O(page size); (2) a server-enforced page size cap (e.g., max 100); (3) opaque cursor to allow internal key format changes; (4) stable ordering guarantee even under concurrent inserts. Bonus: note that offset pagination would require scanning millions of rows for late pages, which is why cursors are the only realistic choice at this scale.
Do encode cursors as opaque base64 strings the client echoes verbatim — this lets you change the cursor's internal format (add a secondary sort key, switch from integer IDs to UUIDs) without a breaking API change. Don't let clients construct cursors themselves by passing raw column values — that couples clients to your storage schema and breaks the moment you restructure the index.
Under the hood: how keyset pagination actually works in SQL
Cursor (keyset) pagination works because of how a B-tree index is structured. Understanding the actual execution path makes it clear why deep offsets are a performance hazard and why keysets are O(page size) at any depth.
OFFSET: why it gets slower with depth
When the database executes SELECT * FROM events ORDER BY created_at DESC, id DESC LIMIT 20 OFFSET 100000, it cannot skip 100,000 rows by reading an index entry. The B-tree index gives a sorted traversal, but to satisfy OFFSET n the engine must walk n rows forward from the start of that traversal and then discard them. At offset 100,000 it discards 100,000 rows. At offset 1,000,000 it discards 1,000,000 rows. The cost grows linearly with depth — this is the performance cliff that shows up in production when a "paginated" endpoint suddenly takes 30 seconds for a user who scrolled to the bottom.
Keyset: an index seek, not an index scan
Keyset pagination rewrites the predicate so the database can seek directly to the start of your page. For a feed ordered by (created_at DESC, id DESC), the keyset query looks like this:
-- Page 1 (no cursor): read the newest rows
SELECT id, body, created_at
FROM events
ORDER BY created_at DESC, id DESC
LIMIT 20;
-- Suppose the last row returned had: created_at = '2026-06-19 10:00:05', id = 8821
-- Cursor encodes those two values, e.g. base64('{"ts":"2026-06-19T10:00:05Z","id":8821}')
-- Page 2 (with cursor): row-value comparison pushes the seek into the index
SELECT id, body, created_at
FROM events
WHERE (created_at, id) < ('2026-06-19 10:00:05', 8821)
ORDER BY created_at DESC, id DESC
LIMIT 20;
The composite predicate (created_at, id) < (last_ts, last_id) is a row-value comparison. On a composite index (created_at DESC, id DESC), the database evaluates this as a range condition on the leading column first, then the trailing column to break ties. The B-tree does a single seek to position just before the anchor row, then reads forward 20 rows. Zero rows are discarded. The cost is O(20) — identical whether the anchor is 20 rows from the start or 20 million rows in.
Concrete traced example: two queries, two plans
Assume events has 5 million rows and a composite index on (created_at DESC, id DESC). Run both versions and compare EXPLAIN output:
The seek in Query B lands at the anchor row via B-tree traversal — O(log N) — and then reads exactly 20 rows forward. That is 34,000× faster at this depth, and the gap widens with dataset size.
How the cursor is encoded
The cursor is a base64url encoding of the last row's sort key values. For a feed ordered by (created_at, id):
# Server produces the cursor from the last row of each page
last_row = { "ts": "2026-06-19T10:00:05Z", "id": 8821 }
cursor = base64url(json.dumps(last_row))
# → "eyJ0cyI6IjIwMjYtMDYtMTlUMTAwMDA1WiIsImlkIjo4ODIxfQ"
# Server decodes on the next request
decoded = json.loads(base64url_decode(cursor))
# → {"ts": "2026-06-19T10:00:05Z", "id": 8821}
# Uses those values in the WHERE clause above
The base64 encoding is opaque to the client — they never parse it. This lets you add a tie-breaking secondary sort key (or switch from integer IDs to UUIDs) without changing the client API contract.
N+1 and the batching fix — traced
The N+1 problem compounds pagination pain: 1 page query returns 20 rows, then 20 individual queries fire for related data. On a 1 ms database, this chain costs roughly 21 ms of query time plus 20 round-trip overhead costs. Batching collapses it to 2 queries:
How to debug & inspect it
Pagination bugs tend to be silent — a client sees duplicate rows, skipped rows, or strangely slow responses. The tools below surface these problems precisely.
Inspect a cursor manually
Cursors are base64 — decode them in the shell to confirm what the server is anchoring on:
Time a deep-offset query to prove the problem
EXPLAIN to verify an index seek
A good keyset query should show Index Scan (not Seq Scan) with an Index Cond row — not a Filter row, which indicates a post-scan filter that still scanned the table. If you see a Filter, the index is not covering the WHERE clause correctly.
Detect N+1 in application logs
Enable query logging and count identical-pattern queries per request cycle:
| Symptom | Likely cause | Fix |
|---|---|---|
| Page 1 is fast, pages 50+ take seconds | Offset pagination scanning discarded rows | Switch to keyset/cursor pagination |
| Client sees duplicate row across two pages | A row was inserted between page fetches; offset shifted | Migrate to cursor pagination (anchor is data value, not position) |
| Client skips a row when paginating | Same insert-shift cause as duplicates | Cursor pagination |
| Decoded cursor has unexpected values | Server encoding wrong row (off-by-one in last-row selection) | Confirm last row is data[-1] after sort; log cursor value on encode |
| Keyset query is slow despite index | Row-value comparison not using index seek (older DB versions) | Rewrite as expanded OR predicate; check EXPLAIN for Index Cond vs Filter |
| 20+ near-identical queries per page load | N+1 — per-row lookup in application loop | Collect IDs, one IN/ANY batch query, then map in memory |
EXPLAIN shows Seq Scan on keyset query | Missing composite index on sort columns | Add CREATE INDEX ON events (created_at DESC, id DESC) |
Debug checklist:
- Is the endpoint slow only on late pages? Run
\timing+EXPLAIN ANALYSEon the exact query — if offset is growing linearly with depth, switch to keyset. - Is the client seeing duplicates or skips? Check whether pagination uses offsets during high-write periods — offset is unstable under concurrent inserts; migrate to cursor.
- Decode the cursor (
base64 -d) and verify the anchor values match the last row of the previous page. - Does the keyset query show
Index Condin EXPLAIN? If it showsFilter, the composite row-value predicate is not being pushed into the index — check index definition and PostgreSQL version. - Count unique query shapes per request in slow-query logs. Five or more instances of the same
SELECT … WHERE id = ?pattern = N+1 — replace with anINbatch query.
The WHERE (col_a, col_b) < (val_a, val_b) syntax (SQL row-value expressions, ISO/IEC 9075) is supported in PostgreSQL 9.5+, MySQL 8+, and SQLite, but not in older MySQL or SQL Server. In those databases you must expand it manually: WHERE col_a < val_a OR (col_a = val_a AND col_b < val_b). Some query builders emit the expanded form by default. Always run EXPLAIN on the actual generated SQL — a row-value comparison that falls back to a full scan is silently slower than the index-seek form.
In production: how leading APIs do it
Stripe and Slack both moved to cursor-only pagination for their high-write, high-volume resources. The performance motivation is the same as the SQL analysis above — but seeing real production APIs commit to it, and the specific reasons they cite, makes the trade-off concrete.
| Provider | Style | Cursor parameter(s) | Has-more signal | Why cursors (performance angle) | Docs |
|---|---|---|---|---|---|
| Stripe | Cursor (object-ID keyset) | starting_after / ending_before (object id) |
has_more: true in response body |
Charges, events, and customers are written continuously; offset pagination would produce duplicate/skipped records under concurrent writes and would become unusably slow at deep pages for large accounts | Stripe API pagination |
| Slack | Cursor (opaque token) | cursor |
response_metadata.next_cursor (empty string when done) |
Channels, messages, and users are high-cardinality; Slack's documentation explicitly states that cursor-based pagination is required for large workspaces and that offset-style pagination is not supported on most endpoints | Slack API pagination |
| GitHub | Offset (page + per_page) |
page, per_page (max 100) |
RFC 5988 Link header (rel="next", rel="last") |
Repository contents and issues are relatively low-write; the offset approach is acceptable, and GitHub uses the Link header so clients do not need to construct page URLs — the main performance concern is staying within the per-page cap (max 100) rather than keyset depth |
GitHub REST pagination |
Deep dive: why high-write APIs like Stripe and Slack use cursors over offset
The SQL analysis earlier in this lesson showed that OFFSET n forces the database to discard n rows on every request. That cost is pure waste and it scales linearly — a Stripe account with millions of charges cannot afford to pay that overhead on a background sync job that walks the full charge history. But there is a second, more subtle reason that matters specifically for high-write APIs: offset instability under concurrent inserts.
Stripe processes payments continuously. If a client is walking charges in pages of 100 and a new charge is created between page 1 and page 2, the offset-based window shifts: the record that was at position 100 is now at position 101. The client's OFFSET 100 LIMIT 100 request skips that record entirely — a silent data integrity error. The same insert that moves the window forward can cause a different record to appear on two consecutive pages. For a financial reconciliation job, that means an undetected missed charge or a double-counted one. Stripe's cursor-based approach anchors on a specific object ID instead of a numeric position: "give me the 100 charges that come after ch_3NkL...." New charges created at any point do not shift that anchor. The client always resumes exactly where it left off, regardless of how many new charges have been created since the last page was fetched.
Slack faces the same problem at the workspace level: users join, leave, and post messages continuously. An offset-based member or message list would produce inconsistent results for any sync job that runs longer than a few seconds. Slack's opaque cursor approach goes one step further than Stripe's by hiding even the structure of the anchor — clients have no knowledge of whether the cursor encodes an integer ID, a timestamp, or a composite key, which means Slack can change its internal sort strategy without any client-visible contract change.
🧠 Quick check
1. A feed table has 10 million rows. A client requests LIMIT 20 OFFSET 9999980. What does the database do?
Offset pagination requires the database to count and skip all rows before the offset. At offset 9,999,980, it must scan nearly the entire table — this is why deep offset pagination is a serious production performance hazard.
2. An API returns 20 orders per page. For each order it calls the customers table once to fetch the customer name. With a page of 20 orders, how many total database queries are executed?
This is the N+1 problem: 1 query to get 20 orders, then 1 more query for each order to fetch its customer = 21 queries. The fix is a JOIN or a single IN-clause batch query to fetch all 20 customers at once.
3. Why is a cursor encoded as an opaque base64 string rather than a raw column value like ?after_id=7892?
An opaque cursor decouples the client from the server's internal sort key. If you later add a secondary sort key (e.g., to break ties), or change the primary key type, the cursor format changes but clients — who simply echo it back — are unaffected. Exposing raw column values couples clients to your schema.
4. A search input fires an API call on every keystroke. The user types "javascript" (10 characters) in 1 second. How many API calls does a 300 ms debounce reduce this to?
Debouncing resets a timer on every event. If keystrokes arrive less than 300 ms apart (normal typing), the timer is cancelled and restarted each time. Only when the user stops typing for 300 ms does the timer expire and the call fire — typically once per query phrase.
5. A new row is inserted at position 50 in a table while a client is paginating with OFFSET 40, LIMIT 10. The client has already fetched page 1 (rows 1–40). What happens when it fetches page 2?
Offset pagination is position-based, not value-based. When a new row is inserted at position 50, every row after it shifts down by one. The offset-40-limit-10 window now lands on different rows than expected — the client sees one duplicate and skips one unique row. This is a silent data integrity bug that cursor pagination eliminates.
✍️ Exercise: design a paginated messages API
You are designing a REST API endpoint to return a user's chat messages: GET /v1/conversations/{id}/messages. Requirements: up to 10 million messages per conversation; messages are inserted frequently (real-time chat); clients display newest-first; clients may need to "load older messages" by scrolling up. Design the pagination contract: method, request parameters, response shape, and cursor design. State any limits you enforce.
Model answer:
# Endpoint
GET /v1/conversations/{id}/messages
# Request parameters
limit int 1–100, default 50, server caps at 100
before str opaque cursor — return messages older than this (for "load more")
after str opaque cursor — return messages newer than this (for real-time catch-up)
# Omit both cursors for the first page (returns newest 50 messages)
# Cursor design (server-side)
# Cursor encodes: {"msg_id": 98234, "created_at": "2026-06-18T12:34:56Z"}
# Composite key handles ties if two messages have the same created_at.
# Encoded opaquely: base64(json)
# SQL for "before" cursor (load older messages)
SELECT * FROM messages
WHERE conversation_id = ? AND (created_at, id) < (cursor_time, cursor_id)
ORDER BY created_at DESC, id DESC
LIMIT 50
# Response shape
{
"data": [
{ "id": 98234, "text": "hey!", "sender_id": 1, "created_at": "..." },
...
],
"before_cursor": "eyJtc2dfaWQiOjk4MTAwfQ", // for loading older
"after_cursor": "eyJtc2dfaWQiOjk4MjM0fQ", // for catching up to new
"has_more_before": true
}
# Limits enforced server-side:
# - max limit = 100 (client cannot request more)
# - no offset parameter exposed (cursor-only)
# - index: (conversation_id, created_at DESC, id DESC)
Rubric: ✓ cursor (keyset) pagination chosen and justified over offset ✓ composite cursor handles tie-breaking on created_at ✓ opaque cursor encoding ✓ server-side max cap enforced ✓ both directions (before/after) supported for chat UX ✓ index specified. Six of six = full marks.
Key takeaways
- Cursor (keyset) pagination is O(page size) at any depth and stable under concurrent inserts. Offset pagination degrades to O(offset + page size) at depth and produces duplicates/skips under inserts. Prefer cursors for any table that will grow or be written concurrently.
- Enforce server-side page size caps. Never allow unbounded result sets — they are a time bomb on both the database and the client.
- The N+1 problem turns a 2-query page load into a 201-query one. Fix it with JOINs, batch IN queries, or DataLoader-style batching.
- Prefetch the next page while the user views the current one — perceived latency drops to ~0 ms at the cost of one speculative API call.
- Debounce high-frequency input events (search, filters) to fire at most one API call per user pause rather than one per keystroke. Cancel superseded in-flight requests.
- Field selection (partial responses) can reduce list endpoint payload by 10–50×. On mobile or high-latency connections this is a material improvement, not an optimisation afterthought.
- Encode cursors opaquely. Clients echo them; they never construct them. This decouples the client from your storage schema and keeps the cursor's internal format changeable without a breaking API change.
Sources & further reading
- Google API Design Guide — List Pagination (cursor-based pagination patterns)
- Microsoft REST API Guidelines — Filtering and Pagination
- Stripe API Docs — Pagination (real-world cursor pagination design)
- GitHub REST API — Traversing with Pagination
- Lesson as-04 — GraphQL (DataLoader pattern for N+1 in GraphQL resolvers)
- Use The Index, Luke — No Offset (deep technical explanation of why OFFSET is slow)
- MDN — AbortController (cancelling superseded in-flight fetch requests)