Rate Limiting
Rate limiting is a speed limit for API requests. Just like a highway has a maximum speed to prevent accidents, APIs have request limits to prevent overload. If your app makes too many requests too quickly (1000 requests per second when the limit is 100), the API starts rejecting the extra ones with a “429 Too Many Requests” error. This protects the service from being overwhelmed and ensures fair access for all users.
Rate limiting restricts the number of requests a client can make to a service within a defined time window. It protects against abuse (scraping, brute force, DDoS), ensures fair resource allocation, and prevents cascading failures.
Common algorithms:
| Algorithm | Description | Pros | Cons |
|---|---|---|---|
| Fixed window | Count requests per time window (e.g., 100/minute) | Simple | Burst at window boundaries |
| Sliding window log | Track timestamp of each request | Precise | Memory-intensive |
| Sliding window counter | Weighted combination of current and previous window | Balanced | Slight approximation |
| Token bucket | Tokens added at fixed rate; each request consumes one | Allows bursts | Slightly complex |
| Leaky bucket | Requests queue and process at a fixed rate | Smooth output | Delays under load |
Rate limit headers (RFC 6585, draft-ietf-httpapi-ratelimit-headers):
| Header | Purpose |
|---|---|
X-RateLimit-Limit | Maximum requests per window |
X-RateLimit-Remaining | Requests remaining in current window |
X-RateLimit-Reset | Unix timestamp when the window resets |
Retry-After | Seconds to wait before retrying (sent with 429) |
Rate limiting dimensions:
- Per IP address: simplest, but shared IPs (NAT, corporate proxies) affect multiple users
- Per API key: most common for authenticated APIs
- Per user account: prevents a single user from monopolizing resources
- Per endpoint: different limits for read vs. write operations
- Global: total request rate across all clients
Rate limiting implementation
// Express.js rate limiting middleware
import rateLimit from "express-rate-limit";
const apiLimiter = rateLimit({
windowMs: 60 * 1000, // 1 minute window
max: 100, // 100 requests per window
standardHeaders: true, // Send RateLimit-* headers
legacyHeaders: false,
message: {
error: "Too many requests",
retryAfter: 60,
},
keyGenerator: (req) => req.headers["x-api-key"] || req.ip,
});
app.use("/api/", apiLimiter);# Client: respect rate limits with exponential backoff
$ curl -sI https://api.example.com/v1/users
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 3
X-RateLimit-Reset: 1713200060
# After hitting the limit:
HTTP/1.1 429 Too Many Requests
Retry-After: 45
# Redis-based distributed rate limiting (Nginx)
# Shared across multiple server instances
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
location /api/ {
limit_req zone=api burst=20 nodelay;
} Every production API implements rate limiting. GitHub API allows 5,000 requests per hour per authenticated user. OpenAI and Anthropic rate-limit by tokens per minute and requests per minute. AWS API Gateway includes built-in rate limiting and throttling. In security, rate limiting is a first-line defense against brute-force login attacks and credential stuffing. The key design decision is the limit values: too restrictive breaks legitimate use cases; too permissive fails to protect the service. Best practice is tiered limits: free tier gets 100 requests/hour, paid tier gets 10,000. Clients must implement exponential backoff: when receiving a 429, wait the Retry-After duration (or 2^attempt seconds) before retrying, with jitter to prevent thundering herd effects.