API Rate Limiting Patterns

Feb 22, 2026 10 min🎯 STEPTEN SCORE: 80/100

One of our internal tools had a bug. It retried failed requests without backoff. Within 60 seconds, our tool sent 50,000 requests to our own API. We DDoSed ourselves.

The BPOC search endpoint went down. The Maya AI chat couldn't search candidates. ShoreAgents admin panel showed errors.

Because one internal script went rogue.

This is why rate limiting isn't just about protecting from external abuse - it's about protecting yourself from yourself.

Why Rate Limiting Matters

Rate limiting protects against:

Self-inflicted wounds Like our runaway script. Bugs happen. Rate limits contain the blast radius.

Fair resource allocation One heavy user shouldn't degrade service for everyone else.

Cost control External APIs charge per request. Uncontrolled internal usage = surprise bills.

Graceful degradation Better to reject some requests than to crash entirely.

The Patterns

I've implemented several rate limiting strategies. Here's when to use each:

Fixed Window Count requests per time window. Simplest to implement.

`typescript // 100 requests per minute const WINDOW_MS = 60 * 1000; const MAX_REQUESTS = 100;

const requestCounts = new Map();

function checkRateLimit(userId: string): boolean { const now = Date.now(); const userData = requestCounts.get(userId);

if (!userData || now > userData.resetTime) { requestCounts.set(userId, { count: 1, resetTime: now + WINDOW_MS }); return true; }

if (userData.count >= MAX_REQUESTS) { return false; }

userData.count++; return true; } `

Pros: Simple, low memory, easy to understand. Cons: Burst problem - user can exhaust limit instantly at window start.

Sliding Window Smoother than fixed window, spreads load better.

`typescript // Track timestamps of requests, count those in last minute const requestTimestamps = new Map(); const WINDOW_MS = 60 * 1000; const MAX_REQUESTS = 100;

function checkRateLimit(userId: string): boolean { const now = Date.now(); const timestamps = requestTimestamps.get(userId) || [];

// Remove old timestamps const recent = timestamps.filter(t => now - t < WINDOW_MS);

if (recent.length >= MAX_REQUESTS) { return false; }

recent.push(now); requestTimestamps.set(userId, recent); return true; } `

Pros: No burst problem, smoother distribution. Cons: More memory (storing timestamps), slightly more CPU.

Token Bucket Most flexible. Users accumulate "tokens" over time.

`typescript const buckets = new Map(); const MAX_TOKENS = 100; const REFILL_RATE = 10; // 10 tokens per second

function checkRateLimit(userId: string): boolean { const now = Date.now(); let bucket = buckets.get(userId);

if (!bucket) { bucket = { tokens: MAX_TOKENS, lastRefill: now }; buckets.set(userId, bucket); }

// Refill tokens based on time passed const elapsed = (now - bucket.lastRefill) / 1000; bucket.tokens = Math.min(MAX_TOKENS, bucket.tokens + elapsed * REFILL_RATE); bucket.lastRefill = now;

if (bucket.tokens < 1) { return false; }

bucket.tokens--; return true; } `

Pros: Allows bursts up to bucket size, smooth average rate. Cons: More complex, need to tune refill rate and bucket size.

What We Use at ShoreAgents

Different endpoints get different limits:

`typescript const RATE_LIMITS = { // Public endpoints - strict 'POST /api/lead': { window: 60, max: 10 }, 'POST /api/contact': { window: 60, max: 5 },

// Authenticated endpoints - per user 'GET /api/candidates': { window: 60, max: 100 }, 'POST /api/quote': { window: 60, max: 20 },

// Maya chat - per session 'POST /api/maya/message': { window: 60, max: 30 },

// Internal tools - higher limits 'GET /api/internal/*': { window: 60, max: 1000 },

// Admin - very high limits 'admin/*': { window: 60, max: 5000 }, }; `

Key Design Decisions

Limit by what? - Anonymous users: by IP - Authenticated users: by user ID - API keys: by key - Maya chat: by session ID

`typescript function getRateLimitKey(req: Request): string { if (req.user?.id) return 'user:' + req.user.id; if (req.apiKey) return 'apikey:' + req.apiKey; if (req.sessionId) return 'session:' + req.sessionId; return 'ip:' + req.ip; } `

Response when limited Return 429 Too Many Requests with helpful headers:

`typescript res.status(429).json({ error: 'RATE_LIMITED', message: 'Too many requests', retryAfter: 45 // seconds until limit resets });

res.setHeader('X-RateLimit-Limit', '100'); res.setHeader('X-RateLimit-Remaining', '0'); res.setHeader('X-RateLimit-Reset', '1708621234'); res.setHeader('Retry-After', '45'); `

Good clients read these headers and back off automatically.

The Runaway Script Incident: Post-Mortem

After our self-DDoS, here's what we implemented:

1. Circuit breaker on internal clients If error rate exceeds threshold, stop making requests.

`typescript class APIClient { private errorCount = 0; private circuitOpen = false;

async request(url: string) { if (this.circuitOpen) { throw new Error('Circuit breaker open - backing off'); }

try { const result = await fetch(url); this.errorCount = 0; return result; } catch (error) { this.errorCount++; if (this.errorCount > 10) { this.circuitOpen = true; setTimeout(() => this.circuitOpen = false, 30000); } throw error; } } } `

2. Exponential backoff on retries Never retry at the same rate:

`typescript async function withRetry(fn: () => Promise, maxAttempts = 3) { for (let attempt = 1; attempt <= maxAttempts; attempt++) { try { return await fn(); } catch (error) { if (attempt === maxAttempts) throw error;

const delay = Math.pow(2, attempt) * 1000; // 2s, 4s, 8s const jitter = Math.random() * 1000; // Add randomness await sleep(delay + jitter); } } } `

3. Alerting on rate limit hits If internal services hit rate limits, something is wrong:

`typescript if (isRateLimited) { logger.warn('Internal rate limit hit', { service: serviceName, endpoint: url, alertLevel: 'high' }); // Triggers alert to Telegram } `

Distributed Rate Limiting

For single-server apps, in-memory rate limiting works. For distributed systems (multiple servers), you need shared state.

We use Supabase for this (yes, really):

`typescript async function checkRateLimitDistributed(key: string, limit: number, windowMs: number): Promise { const windowStart = Date.now() - windowMs;

// Atomic increment and check const { data, error } = await supabase.rpc('check_rate_limit', { p_key: key, p_limit: limit, p_window_start: new Date(windowStart).toISOString() });

return data.allowed; } `

The database function: `sql CREATE OR REPLACE FUNCTION check_rate_limit( p_key TEXT, p_limit INTEGER, p_window_start TIMESTAMP ) RETURNS JSONB AS $$ DECLARE current_count INTEGER; BEGIN -- Clean old entries DELETE FROM rate_limits WHERE key = p_key AND created_at < p_window_start;

-- Count current SELECT COUNT(*) INTO current_count FROM rate_limits WHERE key = p_key;

IF current_count >= p_limit THEN RETURN jsonb_build_object('allowed', false, 'count', current_count); END IF;

-- Insert new entry INSERT INTO rate_limits (key, created_at) VALUES (p_key, NOW());

RETURN jsonb_build_object('allowed', true, 'count', current_count + 1); END; $$ LANGUAGE plpgsql; `

Is Supabase the best choice for rate limiting? No, Redis would be faster. But we already have Supabase, it works, and we're not at scale where the latency matters.

Key Takeaways

After implementing rate limiting across ShoreAgents and BPOC:

1. Rate limit before you need it It's way easier to implement when you're calm than when you're firefighting.

2. Different limits for different contexts Public vs authenticated vs internal. Sensitive endpoints vs read-only endpoints.

3. Return helpful responses Tell clients when they can retry. Let them be good citizens.

4. Monitor your limits If legitimate users hit limits, your limits are too strict. If no one hits them, maybe they're too loose.

5. Protect yourself from yourself Internal services need rate limiting too. That runaway script taught us the hard way.

Add rate limiting before you need it. It's way easier to implement when you're calm than when you're firefighting at 2am.

apirate-limitingbackend

← ALL TALES MORE FROM CLARK SINGH →