System designmediumscaleinfrastructuredesign-systems
Design a rate limiter for an API gateway
Framework
- Clarify: per-user / per-IP / per-key? burst tolerance? distributed?
- Algorithms: fixed window (simple but bursty), sliding window (smoother), token bucket (handles bursts well), leaky bucket (rate-smoothing)
- Pick token bucket — it's the standard answer and you should know why
- Storage: Redis with atomic operations (INCR + EXPIRE, or Lua script for atomicity)
- Distribution: per-edge counter is fast but inaccurate; central Redis is accurate but a bottleneck — discuss the tradeoff
- Failure mode: if Redis is down, fail open or closed? Usually fail open with circuit breaker.
Sample answer
Token bucket, per-API-key, backed by Redis.
For each key, we keep `{tokens, lastRefill}`. On each request:
1. Compute tokens to refill = (now - lastRefill) × refillRate
2. tokens = min(maxBurst, tokens + refill)
3. If tokens >= 1, decrement and allow; else return 429 with Retry-After
The whole operation runs as a single Lua script in Redis for atomicity.
Without it you have race conditions under load.
Distribution: each gateway instance hits central Redis directly. At 100k
QPS, that's 100k Redis ops/sec — handle it with Redis Cluster or sharding
by key hash.
Failure mode: if Redis is unreachable, fail OPEN (allow requests) with a
circuit breaker — better to let some abuse through than to take down the
API for paying customers. Page on-call.
Bonus: emit a per-key counter to a metrics store so customers can see
their usage in their dashboard.Common pitfalls
- Picking fixed-window without discussing why (it's bursty)
- Doing per-instance counting instead of central — leads to N× the actual limit
- Not mentioning failure-mode behavior when Redis is down