API Design

Platform & API Product Engineering · Lesson 08

Usage metering & monetization

Every missed billable event is lost revenue; every double-counted event produces an angry customer and a chargeback. Building metering that is simultaneously durable, accurate, and queryable — at millions of events per day — requires the same engineering rigor as your most critical data pipeline.

⏱ 26 min Difficulty: advanced Prereq: Idempotency (rel-02), Event-driven systems (rel-10)

By the end you'll be able to

Why metering is harder than logging

Application logging tolerates loss: a missing log line means a gap in your dashboard, not a business error. Metering is different. A missed usage event means a customer goes unbilled for real work your system performed — lost revenue that compounds every billing cycle. A double-counted event means an overcharge, a support ticket, a refund, and erosion of trust. The asymmetry of consequences forces metering to be designed like a financial system, not a logging system.

The second challenge is scale. Stripe processes millions of payment events daily. Twilio sends billions of SMS per year. AWS meters hundreds of services per second across millions of customers. At those volumes, even a 0.01% error rate is operationally significant. The solutions that work in production — durable queues, pre-aggregated counters, idempotency keys — are not accidental. They emerged from painful lessons about what breaks at scale.

Think of the metering pipeline as a bank's double-entry ledger: every transaction is recorded first in a durable journal (the queue), then summarized in account balances (the counters). You can reconstruct the balances from the journal. You cannot reconstruct the journal from the balances. That asymmetry shapes every architectural decision below.

The metering pipeline: five stages

App Server emit event publish Kafka Queue durable log consume Aggregator dedup + count write Counter Store Redis / DB per-acct/metric/period query Usage API customer-facing period-end export Billing System Stripe Billing / internal Raw events stay in Kafka for audit & late-event replay; counters power the fast paths
Fig 1 — The five-stage metering pipeline. Kafka decouples emission from processing; the aggregator deduplicates and increments counters; the Usage API and Billing System both read from those counters.

Stage 1 — emit a usage event per billable call

Every billable action — an API call, a sent message, a processed transaction — produces one usage event. The event schema must include an event_id (a UUID or content-hash used as the idempotency key), account_id, metric_name, quantity, and timestamp. Metadata fields (region, product_tier, API version) enable later analytics but are not required for billing correctness.

# Usage event schema — emitted per billable action
{
  "event_id":        "evt_a1b2c3d4e5f6",    // idempotency key — UUID v4 or content hash
  "account_id":      "acct_42",
  "metric_name":     "sms_sent",
  "quantity":        1,                       // or segment_count for multi-part SMS
  "timestamp":       "2026-06-20T10:30:00.123Z", // ISO 8601 with ms precision
  "metadata":        { "region": "us-east-1", "tier": "growth" }
}

The emission must be fire-and-forget from the request path. If the meter write is synchronous (the API call blocks waiting for the billing system to acknowledge), a billing system outage becomes an API outage. Emit to the queue asynchronously; the request path should never wait on the metering infrastructure.

Stage 2 — durable queue (the journal)

The queue — typically Kafka, but AWS Kinesis or Google Pub/Sub serve the same purpose — is the metering system's source of truth. It buffers events between emission and processing, provides durability (events survive consumer crashes), and enables replay (re-process events from any offset to correct bugs or backfill missing data).

The critical property is at-least-once delivery: Kafka guarantees every acknowledged event is stored until consumed, but a consumer crash after processing but before committing its offset will replay the event. This is by design — it means you never lose revenue — but it requires the aggregator to handle duplicates. See Stage 3.

Kafka's retention period (typically 7 days, extendable to 30+) serves as the raw event store for audit and late-event replay. This is different from the counter store: the raw events answer "show me every SMS sent on June 20" while the counters answer "how many SMS did account 42 send in June?"

Stage 3 — aggregate: dedup + count

The aggregator consumer reads events from Kafka, deduplicates them using the event_id, and increments per-account per-metric per-period counters. In Redis, this looks like:

# Aggregator pseudocode — for each event from Kafka
def process_event(event):
    seen_key    = "seen:" + event["event_id"]
    counter_key = "usage:" + event["account_id"] + ":" + period(event["timestamp"]) \
                  + ":" + event["metric_name"]

    # Atomic dedup check + increment using Lua script
    lua_script = """
      local seen_key   = KEYS[1]
      local ctr_key    = KEYS[2]
      local qty        = tonumber(ARGV[1])
      local ttl_seen   = tonumber(ARGV[2])  -- e.g. 2592000 (30 days)

      if redis.call('EXISTS', seen_key) == 1 then
        return 0  -- already processed; skip
      end
      redis.call('SETEX', seen_key, ttl_seen, '1')  -- mark as seen
      redis.call('HINCRBY', ctr_key, 'total', qty)  -- increment counter
      return 1  -- processed
    """
    result = redis.eval(lua_script, [seen_key, counter_key],
                        [event["quantity"], 2592000])

    if result == 0:
        LOG("duplicate event skipped: " + event["event_id"])
    else:
        kafka.commit_offset()  # only commit after successful processing

Stage 4 — usage API (customer-facing)

Customers need to see their own consumption — not just at invoice time, but on-demand. A well-designed usage API serves this from the counter store (Redis), giving near-real-time data without expensive raw-event scans:

# Usage API — GET /v1/usage
GET /v1/usage?metric=sms_sent&period=2026-06&account_id=acct_42
Authorization: Bearer {customer_api_key}

# Response
{
  "account_id":    "acct_42",
  "metric":        "sms_sent",
  "period":        "2026-06",
  "consumed":      8432,
  "quota":         10000,
  "quota_remaining": 1568,
  "as_of":         "2026-06-20T10:30:45Z"   // when the counter was last updated
}

The implementation: a single HGET usage:acct_42:2026-06:sms_sent total Redis call. Compare that to scanning raw events from Kafka — which would require reading gigabytes of data for a long-running account. Pre-aggregated counters are why the usage API is fast.

Stage 5 — billing export

At the end of each billing period (typically midnight UTC), a billing exporter reads the final counter values for every active account and submits invoice line items to the billing system. For Stripe Billing metered subscriptions, this means calling the meter events API or the usage records endpoint. For internal systems, it means writing records to a billing database table.

The exporter should reconcile: compare the counter total to a raw-event count (by scanning Kafka or a separate raw-event database) for a sample of accounts. Persistent discrepancies indicate a dedup bug or an aggregator lag that needs investigation before the invoice is sent.

Accuracy under retries and failures

First delivery (t=0) evt_abc123 arrives CHECK seen-set EXISTS seen:evt_abc123 NOT FOUND SETEX seen:evt_abc123 30-day TTL HINCRBY counter +1 counter = 8432 → 8433 Second delivery — retry (t=500ms) evt_abc123 arrives (duplicate!) CHECK seen-set EXISTS seen:evt_abc123 FOUND SKIP — no write counter unchanged still 8433 ✓ The Lua script makes CHECK + SETEX + HINCRBY atomic — no race between two concurrent consumers The 30-day seen-set TTL auto-expires old event IDs — prevents unbounded memory growth
Fig 2 — Deduplication via idempotency key. The seen-set lookup is atomic with the counter increment (Lua script), preventing double-counting even when two aggregator instances process the same retry simultaneously.

Late events and reconciliation

Network partitions and Kafka consumer lag mean some events arrive after the billing period closes. A message sent at 23:59:58 on June 30 might not reach the aggregator until July 1 00:00:03. Three strategies handle this:

  1. Use event timestamp, not arrival timestamp. The counter key includes the period derived from event.timestamp, not now(). An event arriving late is credited to the correct billing period.
  2. Keep counters writable for a grace window. Hold the billing period open for 5–15 minutes after midnight before exporting. Most late events arrive within seconds; a 5-minute grace window catches nearly all of them.
  3. Reconciliation run. For high-value accounts, scan the raw events from Kafka for the period and compare the sum to the counter value. If they diverge beyond a threshold, investigate before sending the invoice. This is the equivalent of a bank's end-of-day balancing run.

Quotas, plans, and overage

Incoming request Gateway HGET usage:acct:period:metric usage vs quota? ≥ hard_limit → 429 BLOCK soft_limit ≤ usage < hard_limit allow + flag for overage billing usage < soft_limit allow, emit usage event normally Hard limit = absolute block; soft limit = allow + notify customer + overage billing
Fig 3 — Quota enforcement at the gateway. The counter store (Redis) is the single source of truth for current-period usage. The gateway reads it on every request — typically a single HGET at <1ms RTT.

Soft vs hard limits

A hard limit blocks requests with a 429 once usage reaches the quota ceiling. A soft limit allows requests to continue — emitting a warning to the customer and flagging the account for overage billing — but does not block. The right choice depends on the product contract: infrastructure APIs (Twilio SMS, AWS Lambda) typically use hard limits to protect the platform; SaaS APIs often prefer soft limits with graduated billing because an unexpected 429 in a production workflow is worse for customer trust than an unexpected charge that can be explained.

Both types connect to the rate limiter (see plat-01): the quota counter and the rate limiter both live in Redis, and the gateway reads both on every request. Rate limiting governs how fast a customer sends requests; quota governs how many over a billing period. They are different dimensions of the same enforcement problem.

Pricing models: metered, seat, tiered

Metered pricing bills per unit of consumption (per SMS, per API call, per GB processed). Every unit must be metered. This lesson covers metered pricing because it requires the most infrastructure — every call is billable and must be counted accurately.

Seat pricing bills per active user regardless of usage. Metering is simpler (count provisioned users, not calls), but you still need usage data to identify inactive seats and justify renewals.

Tiered pricing combines a base quota (included in the plan) with overage billing above it. The quota counter serves both: it drives hard/soft limit enforcement during the month and provides the final total for overage calculation at period close.

Under the hood: from API call to billing record

Trace a single SMS-send request through the entire metering pipeline, including a retry scenario:

# ── Stage 1: client request ────────────────────────────────────────── POST https://api.yourplatform.com/v1/messages Authorization: Bearer sk_live_acct42_key { "to": "+15551234567", "body": "Your order ships tomorrow." } # ── Stage 2: gateway checks quota before processing ────────────────── HGET usage:acct_42:2026-06:sms_sent total → "8432" # quota = 10,000 → 8432 < 10,000 → allow # ── Stage 3: service processes the SMS ─────────────────────────────── SMS delivered. Carrier confirms delivery at 10:30:00.500Z # ── Stage 4: emit usage event to Kafka ─────────────────────────────── kafka.produce("usage-events", { "event_id": "evt_a1b2c3d4e5f6", "account_id": "acct_42", "metric_name": "sms_sent", "quantity": 1, "timestamp": "2026-06-20T10:30:00.123Z" }) # Kafka acks after all in-sync replicas write the message ack received → return 201 to client # ── Stage 5: aggregator consumer processes the event ───────────────── # Lua script (atomic — no race between consumer instances): EXISTS seen:evt_a1b2c3d4e5f6 → 0 (not seen) SETEX seen:evt_a1b2c3d4e5f6 2592000 1 → OK (30-day TTL) HINCRBY usage:acct_42:2026-06:sms_sent total 1 → 8433 commit Kafka offset # ── Stage 6: RETRY SCENARIO — same event arrives again ─────────────── # (network error caused re-emit of the same event_id) EXISTS seen:evt_a1b2c3d4e5f6 → 1 (FOUND) → skip; no HINCRBY; counter stays at 8433 ← dedup works commit Kafka offset # still commit — we processed it (by skipping) # ── Stage 7: billing export at period close ────────────────────────── HGET usage:acct_42:2026-06:sms_sent total → "9847" # final count # plan includes 10,000 SMS → no overage # stripe.billing.meter_events.create(account="acct_42", event_name="sms_sent", value=9847) Invoice line item: "9,847 SMS sent in June 2026" → $0 overage

By the numbers: 100 million billable events per day

Scale the pipeline to understand where storage and compute cost actually lives.

Raw event storage in Kafka (modeled):

events_per_day = 100_000_000 bytes_per_event = 200 # JSON payload, avg (modeled) kafka_retention = 7 # days raw_daily_gb = events_per_day * bytes_per_event / 1e9 = 20 GB/day raw_total_gb = raw_daily_gb * kafka_retention = 140 GB # With 3x replication factor: 420 GB total Kafka storage (modeled)

Pre-aggregated counter storage in Redis (modeled):

active_accounts = 50_000 metrics_per_acct = 20 # sms_sent, api_calls, data_gb, ... (modeled) bytes_per_counter= 8 # int64 in Redis HASH field counter_keys = active_accounts * metrics_per_acct = 1_000_000 redis_storage_mb = counter_keys * bytes_per_counter / 1e6 = 8 MB # Ratio: 140 GB (raw) vs 8 MB (counters) = 17,500× storage reduction (modeled) # The counter store fits entirely in a single Redis instance's RAM

Dedup seen-set size (modeled):

event_ids_per_day = 100_000_000 bytes_per_event_id = 16 # UUID stored as raw bytes, not string ttl_days = 30 # retain 30 days of event IDs for dedup seen_set_gb = event_ids_per_day * bytes_per_event_id * ttl_days / 1e9 = 100M * 16 * 30 / 1e9 = 48 GB (modeled) # 48 GB exceeds typical single-Redis RAM for a hot dataset # Alternative: Bloom filter — 1.2 GB for 100M items at 1% false-positive rate (modeled) # Bloom filter trade-off: ~1% of events incorrectly flagged as duplicates and skipped # Acceptable for analytics dashboards; NOT acceptable for billing (use exact Redis dedup)

Governing formula — when pre-aggregation pays off:

storage_ratio = (raw_event_bytes * events_per_day * retention_days) / (counter_bytes * unique_keys) # Break-even: pre-aggregation overhead (seen-set) exceeds benefit when... # seen_set_size > counter_store_savings + query_cost_savings # At ~1M events/day: raw = 200 MB/day, counters = 8 MB → marginal benefit # At ~10M events/day: raw = 2 GB/day, counters = 8 MB → aggregation clearly wins # Rule of thumb: aggregate once you exceed ~5M events/day (modeled)

Aggregation window decision: Real-time stream aggregation (Flink, a Redis consumer) keeps counters current within seconds — the right choice for customer-facing usage dashboards. Hourly batch jobs reduce infrastructure complexity but mean the usage API is 0–60 minutes stale, acceptable for billing exports but frustrating for customers debugging unexpected quota exhaustion. Daily batch jobs are only appropriate if you never expose real-time usage to customers and billing periods are monthly.

Pros and cons: the four key trade-offs

Trade-offOption AOption BWhen A winsWhen B wins
Aggregation timing Real-time stream (Flink/consumer) Batch (nightly cron) Customer-facing dashboard needs <60s freshness; quota enforcement needs current counters Usage is only needed at invoice time; operational simplicity outweighs real-time visibility
Metering placement At the API gateway (before service) Inside each microservice (after processing) Consistent enforcement across all services; simpler deployment; quota checked before work is done You need business-logic-aware metering (e.g. count only successful operations, not all calls); gateway can't distinguish outcome
Counter storage Pre-aggregated counters (Redis HASH) Raw event store (Kafka / data warehouse) Fast usage API (<1ms query); low storage cost; real-time quota enforcement Flexible ad-hoc queries (arbitrary time ranges, custom groupings, retroactive plan changes); audit trail
Deduplication approach Exact (Redis seen-set with event_id) Probabilistic (Bloom filter) Billing accuracy is non-negotiable; false positives (skipping a real event) are unacceptable Analytics dashboards where ~1% error rate is acceptable; memory budget is extremely tight

How real platforms do it

Stripe Billing metered subscriptions use a meter events API: you report usage by pushing events to Stripe's API (one call per billable action, with an idempotency key to prevent double-counting). Stripe aggregates these events and includes them in the customer's upcoming invoice. Stripe also supports a "billing meter" resource that lets you configure aggregation mode (sum, max, last value) per metric. See Stripe usage-based billing docs.

AWS Marketplace Metering Service requires SaaS products listed on the Marketplace to report metered usage by calling the MeterUsage API. AWS enforces that each call is idempotent (via a UsageDimension + timestamp key) and provides a 1-hour grace window for late reporting. Usage is aggregated per customer, per dimension, per hour. See AWS Marketplace metering integration docs.

Twilio meters at the point of delivery: an SMS or call is billable when the carrier confirms delivery (or attempt). Each message has a unique SID that serves as the idempotency key. Twilio's usage records API exposes per-account, per-resource, per-period consumption — the customer-facing equivalent of the usage API described in Stage 4. See Twilio pricing and usage model.

🎯 Interview angle

"Design a usage metering system for 100M events per day." The senior answer covers all five stages: emit (async, fire-and-forget, with event_id), durable queue (Kafka, at-least-once), aggregator (idempotency-key dedup with Redis seen-set, HINCRBY counters), usage API (Redis HGET, <1ms), billing export (period-close scan). Then add the math: 20 GB/day raw vs 8 MB counters. Then the failure modes: late events (use event timestamp not arrival time), consumer outage (Kafka replays from offset), cold start (no thundering herd issue here — aggregator catches up from Kafka). Candidates who skip either the dedup mechanism or the raw/aggregated storage split miss the core reliability story.

⚠️ Common trap

Metering synchronously in the API request path. If you write to the billing database as part of handling the request, a billing system slowdown or outage becomes an API latency spike or outage. More subtly, if the write fails after the service has done the work (delivered the SMS, run the computation), you must choose: return an error to the client (who will retry and get double-billed) or swallow the error (and lose the revenue). There is no good answer — the synchronous design has a fundamental race. Move metering to an async queue. The request path should fire-and-forget the usage event; billing correctness is the aggregator's job, not the API handler's.

✅ Do this, not that

Always retain raw events, even after aggregating. Pre-aggregated counters are compact and fast — they are the right data structure for the usage API and quota enforcement. But they are lossy: once you sum 1 million events into a counter, you cannot reconstruct which events contributed. Keep raw events in Kafka (or a data warehouse) for at least 90 days. This enables: retroactive correction if a bug caused wrong counts; re-processing for new metrics you didn't anticipate; billing dispute resolution ("show me every API call on June 15"); mid-period plan changes that need historical attribution. The raw events are cheap relative to the operational value. Counters are fast; raw events are correct.

🧠 Quick check

1. Why does at-least-once delivery require deduplication in a metering system?

At-least-once delivery guarantees no events are lost — but as a consequence, network retries and consumer restarts can cause the same event to be delivered and processed more than once. An idempotency key (event_id) stored in a seen-set lets the aggregator recognize and skip duplicate deliveries without double-incrementing the counter.

2. Your Kafka consumer had a 5-minute outage. After it recovers, a customer calls your usage API and sees fewer events than they believe they sent. What is the most likely explanation?

During a consumer outage, events accumulate in Kafka (they are not lost — Kafka retains them until consumed). When the consumer recovers, it replays from its last committed offset. Until the backlog drains, the counter reflects only the events processed so far — it is understated. This is a temporary lag, not data loss, and it resolves as the consumer catches up.

3. Your platform enforces a hard quota of 10,000 API calls per account per day. Where should quota enforcement happen?

Hard quota enforcement belongs at the API gateway — the single chokepoint that sees all traffic regardless of which downstream service handles it. Distributed enforcement in each microservice requires coordination and will miss cross-service usage. Billing-time enforcement is too late: the request already executed and work was already done. Client-side enforcement is advisory at best.

4. You need to show customers their current-period usage within 60 seconds of each API call. Which aggregation approach achieves this?

Nightly batch jobs introduce up to 24-hour lag. Reading raw Kafka events in the usage API handler requires scanning all events since the period start on every request — prohibitively expensive at scale. Stream aggregation keeps per-account counters updated in near real-time; the usage API then serves a single HGET from Redis, completing in under 1ms with data that is at most a few seconds stale.

✍️ Exercise: design the metering pipeline for an SMS API serving 50,000 accounts

You are building a Twilio-like SMS API. Each sent SMS must appear in the customer's usage dashboard within 60 seconds of delivery and generate a monthly invoice. The service processes 1 million SMS per day (modeled). Design the complete metering pipeline: event schema, queue, aggregation, dedup, usage API, and billing export. Also address late events and what happens if the aggregator consumer crashes mid-processing.

Model answer:

Rubric: Full marks for: (a) event_id as idempotency key, (b) async queue (not synchronous), (c) atomic Lua dedup + HINCRBY, (d) Redis counter for fast usage API, (e) raw event retention for audit, (f) period-close billing export with grace window, (g) late event handling via event timestamp, (h) crash recovery via Kafka replay + dedup.

Key takeaways

Sources & further reading