System designmediumscaleinfrastructuredesign-systems

Design a rate limiter for an API gateway

Name: Clevly
Rating: 4.8 (127 reviews)

Framework

Clarify: per-user / per-IP / per-key? burst tolerance? distributed?
Algorithms: fixed window (simple but bursty), sliding window (smoother), token bucket (handles bursts well), leaky bucket (rate-smoothing)
Pick token bucket — it's the standard answer and you should know why
Storage: Redis with atomic operations (INCR + EXPIRE, or Lua script for atomicity)
Distribution: per-edge counter is fast but inaccurate; central Redis is accurate but a bottleneck — discuss the tradeoff
Failure mode: if Redis is down, fail open or closed? Usually fail open with circuit breaker.

Sample answer

Token bucket, per-API-key, backed by Redis.

For each key, we keep `{tokens, lastRefill}`. On each request:
1. Compute tokens to refill = (now - lastRefill) × refillRate
2. tokens = min(maxBurst, tokens + refill)
3. If tokens >= 1, decrement and allow; else return 429 with Retry-After

The whole operation runs as a single Lua script in Redis for atomicity.
Without it you have race conditions under load.

Distribution: each gateway instance hits central Redis directly. At 100k
QPS, that's 100k Redis ops/sec — handle it with Redis Cluster or sharding
by key hash.

Failure mode: if Redis is unreachable, fail OPEN (allow requests) with a
circuit breaker — better to let some abuse through than to take down the
API for paying customers. Page on-call.

Bonus: emit a per-key counter to a metrics store so customers can see
their usage in their dashboard.

Common pitfalls

Picking fixed-window without discussing why (it's bursty)
Doing per-instance counting instead of central — leads to N× the actual limit
Not mentioning failure-mode behavior when Redis is down

Practice this out loud

Reading is necessary but not sufficient — drill the answer out loud with Clevly and get AI feedback on your delivery.

Framework

Sample answer

Common pitfalls

Practice this out loud

More System design questions