Platform & API Product Engineering · Lesson 08
Usage metering & monetization
Every missed billable event is lost revenue; every double-counted event produces an angry customer and a chargeback. Building metering that is simultaneously durable, accurate, and queryable — at millions of events per day — requires the same engineering rigor as your most critical data pipeline.
By the end you'll be able to
- Design a five-stage metering pipeline (emit → queue → aggregate → usage API → billing) and explain why each stage exists.
- Explain why at-least-once delivery requires idempotency-key deduplication and trace the exact Redis operations that prevent double-counting.
- Size the storage trade-off between raw event retention and pre-aggregated counters for 100 million events per day.
Why metering is harder than logging
Application logging tolerates loss: a missing log line means a gap in your dashboard, not a business error. Metering is different. A missed usage event means a customer goes unbilled for real work your system performed — lost revenue that compounds every billing cycle. A double-counted event means an overcharge, a support ticket, a refund, and erosion of trust. The asymmetry of consequences forces metering to be designed like a financial system, not a logging system.
The second challenge is scale. Stripe processes millions of payment events daily. Twilio sends billions of SMS per year. AWS meters hundreds of services per second across millions of customers. At those volumes, even a 0.01% error rate is operationally significant. The solutions that work in production — durable queues, pre-aggregated counters, idempotency keys — are not accidental. They emerged from painful lessons about what breaks at scale.
Think of the metering pipeline as a bank's double-entry ledger: every transaction is recorded first in a durable journal (the queue), then summarized in account balances (the counters). You can reconstruct the balances from the journal. You cannot reconstruct the journal from the balances. That asymmetry shapes every architectural decision below.
The metering pipeline: five stages
Stage 1 — emit a usage event per billable call
Every billable action — an API call, a sent message, a processed transaction — produces one usage event. The event schema must include an event_id (a UUID or content-hash used as the idempotency key), account_id, metric_name, quantity, and timestamp. Metadata fields (region, product_tier, API version) enable later analytics but are not required for billing correctness.
# Usage event schema — emitted per billable action
{
"event_id": "evt_a1b2c3d4e5f6", // idempotency key — UUID v4 or content hash
"account_id": "acct_42",
"metric_name": "sms_sent",
"quantity": 1, // or segment_count for multi-part SMS
"timestamp": "2026-06-20T10:30:00.123Z", // ISO 8601 with ms precision
"metadata": { "region": "us-east-1", "tier": "growth" }
}
The emission must be fire-and-forget from the request path. If the meter write is synchronous (the API call blocks waiting for the billing system to acknowledge), a billing system outage becomes an API outage. Emit to the queue asynchronously; the request path should never wait on the metering infrastructure.
Stage 2 — durable queue (the journal)
The queue — typically Kafka, but AWS Kinesis or Google Pub/Sub serve the same purpose — is the metering system's source of truth. It buffers events between emission and processing, provides durability (events survive consumer crashes), and enables replay (re-process events from any offset to correct bugs or backfill missing data).
The critical property is at-least-once delivery: Kafka guarantees every acknowledged event is stored until consumed, but a consumer crash after processing but before committing its offset will replay the event. This is by design — it means you never lose revenue — but it requires the aggregator to handle duplicates. See Stage 3.
Kafka's retention period (typically 7 days, extendable to 30+) serves as the raw event store for audit and late-event replay. This is different from the counter store: the raw events answer "show me every SMS sent on June 20" while the counters answer "how many SMS did account 42 send in June?"
Stage 3 — aggregate: dedup + count
The aggregator consumer reads events from Kafka, deduplicates them using the event_id, and increments per-account per-metric per-period counters. In Redis, this looks like:
# Aggregator pseudocode — for each event from Kafka
def process_event(event):
seen_key = "seen:" + event["event_id"]
counter_key = "usage:" + event["account_id"] + ":" + period(event["timestamp"]) \
+ ":" + event["metric_name"]
# Atomic dedup check + increment using Lua script
lua_script = """
local seen_key = KEYS[1]
local ctr_key = KEYS[2]
local qty = tonumber(ARGV[1])
local ttl_seen = tonumber(ARGV[2]) -- e.g. 2592000 (30 days)
if redis.call('EXISTS', seen_key) == 1 then
return 0 -- already processed; skip
end
redis.call('SETEX', seen_key, ttl_seen, '1') -- mark as seen
redis.call('HINCRBY', ctr_key, 'total', qty) -- increment counter
return 1 -- processed
"""
result = redis.eval(lua_script, [seen_key, counter_key],
[event["quantity"], 2592000])
if result == 0:
LOG("duplicate event skipped: " + event["event_id"])
else:
kafka.commit_offset() # only commit after successful processing
Stage 4 — usage API (customer-facing)
Customers need to see their own consumption — not just at invoice time, but on-demand. A well-designed usage API serves this from the counter store (Redis), giving near-real-time data without expensive raw-event scans:
# Usage API — GET /v1/usage
GET /v1/usage?metric=sms_sent&period=2026-06&account_id=acct_42
Authorization: Bearer {customer_api_key}
# Response
{
"account_id": "acct_42",
"metric": "sms_sent",
"period": "2026-06",
"consumed": 8432,
"quota": 10000,
"quota_remaining": 1568,
"as_of": "2026-06-20T10:30:45Z" // when the counter was last updated
}
The implementation: a single HGET usage:acct_42:2026-06:sms_sent total Redis call. Compare that to scanning raw events from Kafka — which would require reading gigabytes of data for a long-running account. Pre-aggregated counters are why the usage API is fast.
Stage 5 — billing export
At the end of each billing period (typically midnight UTC), a billing exporter reads the final counter values for every active account and submits invoice line items to the billing system. For Stripe Billing metered subscriptions, this means calling the meter events API or the usage records endpoint. For internal systems, it means writing records to a billing database table.
The exporter should reconcile: compare the counter total to a raw-event count (by scanning Kafka or a separate raw-event database) for a sample of accounts. Persistent discrepancies indicate a dedup bug or an aggregator lag that needs investigation before the invoice is sent.
Accuracy under retries and failures
Late events and reconciliation
Network partitions and Kafka consumer lag mean some events arrive after the billing period closes. A message sent at 23:59:58 on June 30 might not reach the aggregator until July 1 00:00:03. Three strategies handle this:
- Use event timestamp, not arrival timestamp. The counter key includes the period derived from
event.timestamp, notnow(). An event arriving late is credited to the correct billing period. - Keep counters writable for a grace window. Hold the billing period open for 5–15 minutes after midnight before exporting. Most late events arrive within seconds; a 5-minute grace window catches nearly all of them.
- Reconciliation run. For high-value accounts, scan the raw events from Kafka for the period and compare the sum to the counter value. If they diverge beyond a threshold, investigate before sending the invoice. This is the equivalent of a bank's end-of-day balancing run.
Quotas, plans, and overage
Soft vs hard limits
A hard limit blocks requests with a 429 once usage reaches the quota ceiling. A soft limit allows requests to continue — emitting a warning to the customer and flagging the account for overage billing — but does not block. The right choice depends on the product contract: infrastructure APIs (Twilio SMS, AWS Lambda) typically use hard limits to protect the platform; SaaS APIs often prefer soft limits with graduated billing because an unexpected 429 in a production workflow is worse for customer trust than an unexpected charge that can be explained.
Both types connect to the rate limiter (see plat-01): the quota counter and the rate limiter both live in Redis, and the gateway reads both on every request. Rate limiting governs how fast a customer sends requests; quota governs how many over a billing period. They are different dimensions of the same enforcement problem.
Pricing models: metered, seat, tiered
Metered pricing bills per unit of consumption (per SMS, per API call, per GB processed). Every unit must be metered. This lesson covers metered pricing because it requires the most infrastructure — every call is billable and must be counted accurately.
Seat pricing bills per active user regardless of usage. Metering is simpler (count provisioned users, not calls), but you still need usage data to identify inactive seats and justify renewals.
Tiered pricing combines a base quota (included in the plan) with overage billing above it. The quota counter serves both: it drives hard/soft limit enforcement during the month and provides the final total for overage calculation at period close.
Under the hood: from API call to billing record
Trace a single SMS-send request through the entire metering pipeline, including a retry scenario:
By the numbers: 100 million billable events per day
Scale the pipeline to understand where storage and compute cost actually lives.
Raw event storage in Kafka (modeled):
Pre-aggregated counter storage in Redis (modeled):
Dedup seen-set size (modeled):
Governing formula — when pre-aggregation pays off:
Aggregation window decision: Real-time stream aggregation (Flink, a Redis consumer) keeps counters current within seconds — the right choice for customer-facing usage dashboards. Hourly batch jobs reduce infrastructure complexity but mean the usage API is 0–60 minutes stale, acceptable for billing exports but frustrating for customers debugging unexpected quota exhaustion. Daily batch jobs are only appropriate if you never expose real-time usage to customers and billing periods are monthly.
Pros and cons: the four key trade-offs
| Trade-off | Option A | Option B | When A wins | When B wins |
|---|---|---|---|---|
| Aggregation timing | Real-time stream (Flink/consumer) | Batch (nightly cron) | Customer-facing dashboard needs <60s freshness; quota enforcement needs current counters | Usage is only needed at invoice time; operational simplicity outweighs real-time visibility |
| Metering placement | At the API gateway (before service) | Inside each microservice (after processing) | Consistent enforcement across all services; simpler deployment; quota checked before work is done | You need business-logic-aware metering (e.g. count only successful operations, not all calls); gateway can't distinguish outcome |
| Counter storage | Pre-aggregated counters (Redis HASH) | Raw event store (Kafka / data warehouse) | Fast usage API (<1ms query); low storage cost; real-time quota enforcement | Flexible ad-hoc queries (arbitrary time ranges, custom groupings, retroactive plan changes); audit trail |
| Deduplication approach | Exact (Redis seen-set with event_id) | Probabilistic (Bloom filter) | Billing accuracy is non-negotiable; false positives (skipping a real event) are unacceptable | Analytics dashboards where ~1% error rate is acceptable; memory budget is extremely tight |
How real platforms do it
Stripe Billing metered subscriptions use a meter events API: you report usage by pushing events to Stripe's API (one call per billable action, with an idempotency key to prevent double-counting). Stripe aggregates these events and includes them in the customer's upcoming invoice. Stripe also supports a "billing meter" resource that lets you configure aggregation mode (sum, max, last value) per metric. See Stripe usage-based billing docs.
AWS Marketplace Metering Service requires SaaS products listed on the Marketplace to report metered usage by calling the MeterUsage API. AWS enforces that each call is idempotent (via a UsageDimension + timestamp key) and provides a 1-hour grace window for late reporting. Usage is aggregated per customer, per dimension, per hour. See AWS Marketplace metering integration docs.
Twilio meters at the point of delivery: an SMS or call is billable when the carrier confirms delivery (or attempt). Each message has a unique SID that serves as the idempotency key. Twilio's usage records API exposes per-account, per-resource, per-period consumption — the customer-facing equivalent of the usage API described in Stage 4. See Twilio pricing and usage model.
"Design a usage metering system for 100M events per day." The senior answer covers all five stages: emit (async, fire-and-forget, with event_id), durable queue (Kafka, at-least-once), aggregator (idempotency-key dedup with Redis seen-set, HINCRBY counters), usage API (Redis HGET, <1ms), billing export (period-close scan). Then add the math: 20 GB/day raw vs 8 MB counters. Then the failure modes: late events (use event timestamp not arrival time), consumer outage (Kafka replays from offset), cold start (no thundering herd issue here — aggregator catches up from Kafka). Candidates who skip either the dedup mechanism or the raw/aggregated storage split miss the core reliability story.
Metering synchronously in the API request path. If you write to the billing database as part of handling the request, a billing system slowdown or outage becomes an API latency spike or outage. More subtly, if the write fails after the service has done the work (delivered the SMS, run the computation), you must choose: return an error to the client (who will retry and get double-billed) or swallow the error (and lose the revenue). There is no good answer — the synchronous design has a fundamental race. Move metering to an async queue. The request path should fire-and-forget the usage event; billing correctness is the aggregator's job, not the API handler's.
Always retain raw events, even after aggregating. Pre-aggregated counters are compact and fast — they are the right data structure for the usage API and quota enforcement. But they are lossy: once you sum 1 million events into a counter, you cannot reconstruct which events contributed. Keep raw events in Kafka (or a data warehouse) for at least 90 days. This enables: retroactive correction if a bug caused wrong counts; re-processing for new metrics you didn't anticipate; billing dispute resolution ("show me every API call on June 15"); mid-period plan changes that need historical attribution. The raw events are cheap relative to the operational value. Counters are fast; raw events are correct.
🧠 Quick check
1. Why does at-least-once delivery require deduplication in a metering system?
At-least-once delivery guarantees no events are lost — but as a consequence, network retries and consumer restarts can cause the same event to be delivered and processed more than once. An idempotency key (event_id) stored in a seen-set lets the aggregator recognize and skip duplicate deliveries without double-incrementing the counter.
2. Your Kafka consumer had a 5-minute outage. After it recovers, a customer calls your usage API and sees fewer events than they believe they sent. What is the most likely explanation?
During a consumer outage, events accumulate in Kafka (they are not lost — Kafka retains them until consumed). When the consumer recovers, it replays from its last committed offset. Until the backlog drains, the counter reflects only the events processed so far — it is understated. This is a temporary lag, not data loss, and it resolves as the consumer catches up.
3. Your platform enforces a hard quota of 10,000 API calls per account per day. Where should quota enforcement happen?
Hard quota enforcement belongs at the API gateway — the single chokepoint that sees all traffic regardless of which downstream service handles it. Distributed enforcement in each microservice requires coordination and will miss cross-service usage. Billing-time enforcement is too late: the request already executed and work was already done. Client-side enforcement is advisory at best.
4. You need to show customers their current-period usage within 60 seconds of each API call. Which aggregation approach achieves this?
Nightly batch jobs introduce up to 24-hour lag. Reading raw Kafka events in the usage API handler requires scanning all events since the period start on every request — prohibitively expensive at scale. Stream aggregation keeps per-account counters updated in near real-time; the usage API then serves a single HGET from Redis, completing in under 1ms with data that is at most a few seconds stale.
✍️ Exercise: design the metering pipeline for an SMS API serving 50,000 accounts
You are building a Twilio-like SMS API. Each sent SMS must appear in the customer's usage dashboard within 60 seconds of delivery and generate a monthly invoice. The service processes 1 million SMS per day (modeled). Design the complete metering pipeline: event schema, queue, aggregation, dedup, usage API, and billing export. Also address late events and what happens if the aggregator consumer crashes mid-processing.
Model answer:
- Event schema:
{event_id, account_id, metric="sms_sent", quantity=1, timestamp, segment_count}. Theevent_idis a UUID generated at send time — same as the message SID.segment_counthandles multi-part SMS where each segment is separately billable. - Queue: Kafka topic
usage-events, 7-day retention (14 days for audit safety), 3x replication. Partition byaccount_idso all events for one account are ordered and processed by the same consumer, simplifying state. - Aggregator: Consumer group with 4 workers. For each event, run an atomic Lua script:
EXISTS seen:{event_id}→ if found, skip; if not found,SETEX seen:{event_id} 2592000 1thenHINCRBY usage:{account_id}:{period}:sms_sent total {quantity}. Commit Kafka offset only after successful write. - Usage API:
GET /v1/usage?metric=sms_sent&period=2026-06→HGET usage:{account_id}:2026-06:sms_sent total. Includequota_remaining = plan_quota - consumedin the response. Cache at the API layer for 5 seconds to absorb traffic spikes. - Billing export: Runs at midnight UTC + 5 minutes (grace window for late events). For each active account, call
HGET usage:{id}:2026-06:sms_sent total, compare against plan quota, compute overage, emit invoice line item via Stripe Billing meter events API or internal billing DB. - Late events: Counter key uses
event.timestampperiod, not arrival time. Events arriving up to 5 minutes after midnight still land in June's counter. Events arriving later are credited to June via a reconciliation run against raw Kafka data. - Consumer crash mid-processing: Because Kafka offset is only committed after successful Redis write, a crash during the Lua script causes the event to be replayed. The dedup seen-set check handles the replay — if the seen-key was written before the crash, it's a skip; if not, it's a fresh process. Either way, the counter ends up correct.
Rubric: Full marks for: (a) event_id as idempotency key, (b) async queue (not synchronous), (c) atomic Lua dedup + HINCRBY, (d) Redis counter for fast usage API, (e) raw event retention for audit, (f) period-close billing export with grace window, (g) late event handling via event timestamp, (h) crash recovery via Kafka replay + dedup.
Key takeaways
- The metering pipeline has five stages — emit, queue, aggregate, usage API, billing — and each must be designed for durability. A missed event is lost revenue; a double-count is an overcharge. Both are production incidents.
- At-least-once delivery is the correct queue guarantee for metering. Use an
event_ididempotency key stored in a Redis seen-set to deduplicate retried events atomically — the Lua script that checks and writes must be atomic to prevent race conditions between aggregator instances. - Pre-aggregated counters in Redis are ~17,500× more storage-efficient than raw events at 100M events/day. But keep raw events in Kafka for 30–90 days for audit, reconciliation, and late-event replay — the raw journal is what makes the system auditable and correctable.
- Hard quota enforcement belongs at the API gateway, not in downstream services. The gateway is the only place that sees all traffic; enforcement elsewhere creates gaps and requires cross-service coordination.
- A customer-facing usage API — showing current consumption, quota remaining, and historical periods — is not an optional dashboard feature. It reduces support tickets, helps customers avoid unexpected overages, and is the single most impactful trust-building investment in a developer platform's billing UX.
Sources & further reading
- Stripe — Usage-based billing and metered subscriptions
- AWS Marketplace — SaaS metering integration
- Twilio — Pricing and usage model
- Apache Kafka — Design: consumer pull and durability
- Lesson rel-02 — Idempotency
- Lesson rel-10 — Event-driven architecture & Pub/Sub
- Lesson prep-05 — Business sense & pricing