general api-design

Rate Limiting

Updated April 15, 2026

rate-limiting api-design security performance

Plain English

Rate limiting is a speed limit for API requests. Just like a highway has a maximum speed to prevent accidents, APIs have request limits to prevent overload. If your app makes too many requests too quickly (1000 requests per second when the limit is 100), the API starts rejecting the extra ones with a “429 Too Many Requests” error. This protects the service from being overwhelmed and ensures fair access for all users.

Technical Definition

Rate limiting restricts the number of requests a client can make to a service within a defined time window. It protects against abuse (scraping, brute force, DDoS), ensures fair resource allocation, and prevents cascading failures.

Common algorithms:

Algorithm	Description	Pros	Cons
Fixed window	Count requests per time window (e.g., 100/minute)	Simple	Burst at window boundaries
Sliding window log	Track timestamp of each request	Precise	Memory-intensive
Sliding window counter	Weighted combination of current and previous window	Balanced	Slight approximation
Token bucket	Tokens added at fixed rate; each request consumes one	Allows bursts	Slightly complex
Leaky bucket	Requests queue and process at a fixed rate	Smooth output	Delays under load

Rate limit headers (RFC 6585, draft-ietf-httpapi-ratelimit-headers):

Header	Purpose
`X-RateLimit-Limit`	Maximum requests per window
`X-RateLimit-Remaining`	Requests remaining in current window
`X-RateLimit-Reset`	Unix timestamp when the window resets
`Retry-After`	Seconds to wait before retrying (sent with 429)

Rate limiting dimensions:

Per IP address: simplest, but shared IPs (NAT, corporate proxies) affect multiple users
Per API key: most common for authenticated APIs
Per user account: prevents a single user from monopolizing resources
Per endpoint: different limits for read vs. write operations
Global: total request rate across all clients

Rate limiting implementation

// Express.js rate limiting middleware
import rateLimit from "express-rate-limit";

const apiLimiter = rateLimit({
  windowMs: 60 * 1000,      // 1 minute window
  max: 100,                  // 100 requests per window
  standardHeaders: true,     // Send RateLimit-* headers
  legacyHeaders: false,
  message: {
    error: "Too many requests",
    retryAfter: 60,
  },
  keyGenerator: (req) => req.headers["x-api-key"] || req.ip,
});

app.use("/api/", apiLimiter);

# Client: respect rate limits with exponential backoff
$ curl -sI https://api.example.com/v1/users
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 3
X-RateLimit-Reset: 1713200060

# After hitting the limit:
HTTP/1.1 429 Too Many Requests
Retry-After: 45

# Redis-based distributed rate limiting (Nginx)
# Shared across multiple server instances
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
location /api/ {
    limit_req zone=api burst=20 nodelay;
}

In the Wild

Every production API implements rate limiting. GitHub API allows 5,000 requests per hour per authenticated user. OpenAI and Anthropic rate-limit by tokens per minute and requests per minute. AWS API Gateway includes built-in rate limiting and throttling. In security, rate limiting is a first-line defense against brute-force login attacks and credential stuffing. The key design decision is the limit values: too restrictive breaks legitimate use cases; too permissive fails to protect the service. Best practice is tiered limits: free tier gets 100 requests/hour, paid tier gets 10,000. Clients must implement exponential backoff: when receiving a 429, wait the Retry-After duration (or 2^attempt seconds) before retrying, with jitter to prevent thundering herd effects.

Related Terms

APIAPI→DDoSDDoS→REST APIREST→Caching→

← Back to Dictionary