One of our internal tools had a bug. It retried failed requests without backoff. Within 60 seconds, our tool sent 50,000 requests to our own API. We DDoSed ourselves.
The BPOC search endpoint went down. The Maya AI chat couldn't search candidates. ShoreAgents admin panel showed errors.
Because one internal script went rogue.
This is why rate limiting isn't just about protecting from external abuse - it's about protecting yourself from yourself.
Why Rate Limiting Matters
Rate limiting protects against:
Self-inflicted wounds Like our runaway script. Bugs happen. Rate limits contain the blast radius.
Fair resource allocation One heavy user shouldn't degrade service for everyone else.
Cost control External APIs charge per request. Uncontrolled internal usage = surprise bills.
Graceful degradation Better to reject some requests than to crash entirely.
The Patterns
I've implemented several rate limiting strategies. Here's when to use each:
Fixed Window Count requests per time window. Simplest to implement.
`typescript
// 100 requests per minute
const WINDOW_MS = 60 * 1000;
const MAX_REQUESTS = 100;
const requestCounts = new Map
function checkRateLimit(userId: string): boolean { const now = Date.now(); const userData = requestCounts.get(userId);
if (!userData || now > userData.resetTime) { requestCounts.set(userId, { count: 1, resetTime: now + WINDOW_MS }); return true; }
if (userData.count >= MAX_REQUESTS) { return false; }
userData.count++;
return true;
}
`
Pros: Simple, low memory, easy to understand. Cons: Burst problem - user can exhaust limit instantly at window start.
Sliding Window Smoother than fixed window, spreads load better.
`typescript
// Track timestamps of requests, count those in last minute
const requestTimestamps = new Map
function checkRateLimit(userId: string): boolean { const now = Date.now(); const timestamps = requestTimestamps.get(userId) || [];
// Remove old timestamps const recent = timestamps.filter(t => now - t < WINDOW_MS);
if (recent.length >= MAX_REQUESTS) { return false; }
recent.push(now);
requestTimestamps.set(userId, recent);
return true;
}
`
Pros: No burst problem, smoother distribution. Cons: More memory (storing timestamps), slightly more CPU.
Token Bucket Most flexible. Users accumulate "tokens" over time.
`typescript
const buckets = new Map
function checkRateLimit(userId: string): boolean { const now = Date.now(); let bucket = buckets.get(userId);
if (!bucket) { bucket = { tokens: MAX_TOKENS, lastRefill: now }; buckets.set(userId, bucket); }
// Refill tokens based on time passed const elapsed = (now - bucket.lastRefill) / 1000; bucket.tokens = Math.min(MAX_TOKENS, bucket.tokens + elapsed * REFILL_RATE); bucket.lastRefill = now;
if (bucket.tokens < 1) { return false; }
bucket.tokens--;
return true;
}
`
Pros: Allows bursts up to bucket size, smooth average rate. Cons: More complex, need to tune refill rate and bucket size.
What We Use at ShoreAgents
Different endpoints get different limits:
`typescript
const RATE_LIMITS = {
// Public endpoints - strict
'POST /api/lead': { window: 60, max: 10 },
'POST /api/contact': { window: 60, max: 5 },
// Authenticated endpoints - per user 'GET /api/candidates': { window: 60, max: 100 }, 'POST /api/quote': { window: 60, max: 20 },
// Maya chat - per session 'POST /api/maya/message': { window: 60, max: 30 },
// Internal tools - higher limits 'GET /api/internal/*': { window: 60, max: 1000 },
// Admin - very high limits
'admin/*': { window: 60, max: 5000 },
};
`
Key Design Decisions
Limit by what? - Anonymous users: by IP - Authenticated users: by user ID - API keys: by key - Maya chat: by session ID
`typescript
function getRateLimitKey(req: Request): string {
if (req.user?.id) return 'user:' + req.user.id;
if (req.apiKey) return 'apikey:' + req.apiKey;
if (req.sessionId) return 'session:' + req.sessionId;
return 'ip:' + req.ip;
}
`
Response when limited Return 429 Too Many Requests with helpful headers:
`typescript
res.status(429).json({
error: 'RATE_LIMITED',
message: 'Too many requests',
retryAfter: 45 // seconds until limit resets
});
res.setHeader('X-RateLimit-Limit', '100');
res.setHeader('X-RateLimit-Remaining', '0');
res.setHeader('X-RateLimit-Reset', '1708621234');
res.setHeader('Retry-After', '45');
`
Good clients read these headers and back off automatically.
The Runaway Script Incident: Post-Mortem
After our self-DDoS, here's what we implemented:
1. Circuit breaker on internal clients If error rate exceeds threshold, stop making requests.
`typescript
class APIClient {
private errorCount = 0;
private circuitOpen = false;
async request(url: string) { if (this.circuitOpen) { throw new Error('Circuit breaker open - backing off'); }
try {
const result = await fetch(url);
this.errorCount = 0;
return result;
} catch (error) {
this.errorCount++;
if (this.errorCount > 10) {
this.circuitOpen = true;
setTimeout(() => this.circuitOpen = false, 30000);
}
throw error;
}
}
}
`
2. Exponential backoff on retries Never retry at the same rate:
`typescript
async function withRetry(fn: () => Promise
const delay = Math.pow(2, attempt) * 1000; // 2s, 4s, 8s
const jitter = Math.random() * 1000; // Add randomness
await sleep(delay + jitter);
}
}
}
`
3. Alerting on rate limit hits If internal services hit rate limits, something is wrong:
`typescript
if (isRateLimited) {
logger.warn('Internal rate limit hit', {
service: serviceName,
endpoint: url,
alertLevel: 'high'
});
// Triggers alert to Telegram
}
`
Distributed Rate Limiting
For single-server apps, in-memory rate limiting works. For distributed systems (multiple servers), you need shared state.
We use Supabase for this (yes, really):
`typescript
async function checkRateLimitDistributed(key: string, limit: number, windowMs: number): Promise
// Atomic increment and check const { data, error } = await supabase.rpc('check_rate_limit', { p_key: key, p_limit: limit, p_window_start: new Date(windowStart).toISOString() });
return data.allowed;
}
`
The database function:
`sql
CREATE OR REPLACE FUNCTION check_rate_limit(
p_key TEXT,
p_limit INTEGER,
p_window_start TIMESTAMP
) RETURNS JSONB AS $$
DECLARE
current_count INTEGER;
BEGIN
-- Clean old entries
DELETE FROM rate_limits WHERE key = p_key AND created_at < p_window_start;
-- Count current SELECT COUNT(*) INTO current_count FROM rate_limits WHERE key = p_key;
IF current_count >= p_limit THEN RETURN jsonb_build_object('allowed', false, 'count', current_count); END IF;
-- Insert new entry INSERT INTO rate_limits (key, created_at) VALUES (p_key, NOW());
RETURN jsonb_build_object('allowed', true, 'count', current_count + 1);
END;
$$ LANGUAGE plpgsql;
`
Is Supabase the best choice for rate limiting? No, Redis would be faster. But we already have Supabase, it works, and we're not at scale where the latency matters.
Key Takeaways
After implementing rate limiting across ShoreAgents and BPOC:
1. Rate limit before you need it It's way easier to implement when you're calm than when you're firefighting.
2. Different limits for different contexts Public vs authenticated vs internal. Sensitive endpoints vs read-only endpoints.
3. Return helpful responses Tell clients when they can retry. Let them be good citizens.
4. Monitor your limits If legitimate users hit limits, your limits are too strict. If no one hits them, maybe they're too loose.
5. Protect yourself from yourself Internal services need rate limiting too. That runaway script taught us the hard way.
Add rate limiting before you need it. It's way easier to implement when you're calm than when you're firefighting at 2am.

