Skip to content
general api-design

Rate Limiting

rate-limiting api-design security performance
Plain English

Rate limiting is a speed limit for API requests. Just like a highway has a maximum speed to prevent accidents, APIs have request limits to prevent overload. If your app makes too many requests too quickly (1000 requests per second when the limit is 100), the API starts rejecting the extra ones with a “429 Too Many Requests” error. This protects the service from being overwhelmed and ensures fair access for all users.

Technical Definition

Rate limiting restricts the number of requests a client can make to a service within a defined time window. It protects against abuse (scraping, brute force, DDoS), ensures fair resource allocation, and prevents cascading failures.

Common algorithms:

AlgorithmDescriptionProsCons
Fixed windowCount requests per time window (e.g., 100/minute)SimpleBurst at window boundaries
Sliding window logTrack timestamp of each requestPreciseMemory-intensive
Sliding window counterWeighted combination of current and previous windowBalancedSlight approximation
Token bucketTokens added at fixed rate; each request consumes oneAllows burstsSlightly complex
Leaky bucketRequests queue and process at a fixed rateSmooth outputDelays under load

Rate limit headers (RFC 6585, draft-ietf-httpapi-ratelimit-headers):

HeaderPurpose
X-RateLimit-LimitMaximum requests per window
X-RateLimit-RemainingRequests remaining in current window
X-RateLimit-ResetUnix timestamp when the window resets
Retry-AfterSeconds to wait before retrying (sent with 429)

Rate limiting dimensions:

  • Per IP address: simplest, but shared IPs (NAT, corporate proxies) affect multiple users
  • Per API key: most common for authenticated APIs
  • Per user account: prevents a single user from monopolizing resources
  • Per endpoint: different limits for read vs. write operations
  • Global: total request rate across all clients

Rate limiting implementation

// Express.js rate limiting middleware
import rateLimit from "express-rate-limit";

const apiLimiter = rateLimit({
  windowMs: 60 * 1000,      // 1 minute window
  max: 100,                  // 100 requests per window
  standardHeaders: true,     // Send RateLimit-* headers
  legacyHeaders: false,
  message: {
    error: "Too many requests",
    retryAfter: 60,
  },
  keyGenerator: (req) => req.headers["x-api-key"] || req.ip,
});

app.use("/api/", apiLimiter);
# Client: respect rate limits with exponential backoff
$ curl -sI https://api.example.com/v1/users
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 3
X-RateLimit-Reset: 1713200060

# After hitting the limit:
HTTP/1.1 429 Too Many Requests
Retry-After: 45

# Redis-based distributed rate limiting (Nginx)
# Shared across multiple server instances
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
location /api/ {
    limit_req zone=api burst=20 nodelay;
}
In the Wild

Every production API implements rate limiting. GitHub API allows 5,000 requests per hour per authenticated user. OpenAI and Anthropic rate-limit by tokens per minute and requests per minute. AWS API Gateway includes built-in rate limiting and throttling. In security, rate limiting is a first-line defense against brute-force login attacks and credential stuffing. The key design decision is the limit values: too restrictive breaks legitimate use cases; too permissive fails to protect the service. Best practice is tiered limits: free tier gets 100 requests/hour, paid tier gets 10,000. Clients must implement exponential backoff: when receiving a 429, wait the Retry-After duration (or 2^attempt seconds) before retrying, with jitter to prevent thundering herd effects.