# The Engine API Failures — When Perplexity Returned HTML
The content engine broke at 3:47 PM on February 12, 2026. The error message was cryptic. The cause was stupid. And it taught me more about API resilience than any documentation ever could.
The Symptom: An Impossible Error
The content engine runs a 10-stage pipeline. Stage 1 is research — hit Perplexity's API, gather background information, compile sources.
Simple. Reliable. Until it wasn't.
The logs showed:
`
[2026-02-12 15:47:23] Stage 1: Research
[2026-02-12 15:47:24] ERROR: Unexpected token '<' at position 0
[2026-02-12 15:47:24] SyntaxError: JSON.parse: unexpected character at line 1
[2026-02-12 15:47:24] Pipeline aborted
`
That error — Unexpected token '<' — is one of the most confusing messages you can get. It means you tried to parse something as JSON, but it started with <.
What starts with <? HTML.
The API was supposed to return JSON. It was returning HTML.
What We Expected
The Perplexity API documentation is clear. You send a request, you get JSON back:
`json
{
"id": "chatcmpl-abc123",
"model": "sonar-pro",
"created": 1771700000,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Research results about AI agent memory systems..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 150,
"completion_tokens": 500
}
}
`
Clean. Predictable. Parseable.
What We Actually Got
`html
429 Too Many Requests
`A full HTML error page. Not JSON. Not an error object. Not a helpful message. Just nginx's default "fuck off" page served raw.
Our code tried to parse this as JSON. JSON.parse('...') doesn't work. Hence: Unexpected token '<'.
The Investigation
Check 1: Is the API Key Valid?
Maybe the key expired. Maybe we're not authenticated.
`bash
curl -X POST https://api.perplexity.ai/chat/completions \
-H "Authorization: Bearer $PERPLEXITY_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "sonar-pro", "messages": [{"role": "user", "content": "test"}]}'
`
Response: Valid JSON. Working API key.
So it's not authentication. The key works when I test manually.
Check 2: Are We Hitting Rate Limits?
The 429 status code is HTTP for "Too Many Requests." We might be exceeding our allocation.
Checked the Perplexity dashboard:
| Metric | Value | |--------|-------| | Requests today | 847 | | Daily limit | 1000 | | Current rate | 12 req/min | | Rate limit | 20 req/min |
We're not at the daily limit. We're not exceeding requests per minute. So why 429?
Check 3: Check the Response Headers
The raw response included headers I'd been ignoring:
`
HTTP/2 429
content-type: text/html
x-ratelimit-remaining: 0
x-ratelimit-reset: 1771700500
retry-after: 60
`
x-ratelimit-remaining: 0
Zero remaining. But I just checked and we weren't near the limit?
Then I realized: we have MULTIPLE processes hitting the API.
The Cause: Invisible Concurrent Load
Here's what was actually happening:
Process 1: The Content Engine Running the research pipeline. Making 3-5 requests per article.
Process 2: A Cron Job Checking for "fresh content opportunities." Running every 15 minutes. Making 2-3 requests.
Process 3: Me Manually testing prompts in a separate terminal. Making ad-hoc requests.
Process 4: Stephen Also testing something in his session. More requests.
Each process didn't know about the others. Each thought it had full access to the API. Combined, we exceeded the per-minute rate limit — not the daily limit, but the burst limit.
Perplexity's rate limiting works on multiple levels: - Daily request cap (we were fine) - Requests per minute (we exceeded this) - Concurrent requests (possibly exceeded this too)
When you hit a rate limit, their CDN/proxy layer returns HTML before your request even reaches the API proper.
Why HTML Instead of JSON?
This is the infuriating part.
A well-designed API returns error responses in the same format as success responses. If you expect JSON, errors should be JSON:
`json
{
"error": {
"type": "rate_limit_exceeded",
"message": "Too many requests. Retry after 60 seconds.",
"retry_after": 60
}
}
`
But many APIs have infrastructure layers (CDN, load balancers, proxies) that handle errors BEFORE the application. These layers don't know about your API's response format. They return generic HTTP error pages.
Common culprits: - Nginx (429, 503, 502 pages) - Cloudflare (520-529 error pages) - AWS ALB (5xx HTML pages) - Rate limiters (standalone services returning HTML)
Your code parses the body as JSON. It's HTML. Boom: Unexpected token '<'.
The Fixes: Making the Code Resilient
Fix 1: Check Status Code FIRST
Before parsing, check if the request succeeded:
`javascript
const response = await fetch(url, options);
if (!response.ok) {
// Handle error BEFORE trying to parse
const status = response.status;
if (status === 429) {
const retryAfter = response.headers.get('Retry-After') || 60;
throw new RateLimitError(Rate limited. Retry after ${retryAfter}s);
}
if (status >= 500) {
throw new ServerError(Server error: ${status});
}
throw new APIError(Request failed: ${status});
}
// Only parse JSON if request was successful
const data = await response.json();
`
Fix 2: Check Content-Type Before Parsing
Even if status is 200, verify you're getting JSON:
`javascript
const contentType = response.headers.get('Content-Type');
if (!contentType?.includes('application/json')) {
const text = await response.text();
console.error('Unexpected response type:', contentType);
console.error('Body preview:', text.substring(0, 200));
throw new UnexpectedContentError(Expected JSON, got ${contentType});
}
const data = await response.json();
`
Fix 3: Implement Exponential Backoff
When rate limited, don't hammer the API. Wait and retry with increasing delays:
`javascript
async function fetchWithRetry(url, options, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
const response = await fetch(url, options);
if (response.status === 429) {
const retryAfter = parseInt(response.headers.get('Retry-After') || '60');
const delay = Math.min(retryAfter * 1000, 120000); // Max 2 min
console.log(Rate limited. Waiting ${delay/1000}s before retry ${attempt + 1});
await sleep(delay);
continue;
}
if (!response.ok) {
throw new Error(HTTP ${response.status});
}
return await response.json();
} catch (error) {
if (attempt === maxRetries - 1) throw error;
const backoff = Math.pow(2, attempt) * 1000; // 1s, 2s, 4s
console.log(Request failed. Backing off ${backoff/1000}s);
await sleep(backoff);
}
}
}
`
Fix 4: Centralize API Calls Through a Queue
Multiple processes hitting the same API independently is a recipe for rate limits:
`javascript
// api-queue.js
const queue = [];
let processing = false;
async function enqueueRequest(requestFn) { return new Promise((resolve, reject) => { queue.push({ requestFn, resolve, reject }); processQueue(); }); }
async function processQueue() {
if (processing || queue.length === 0) return;
processing = true;
while (queue.length > 0) {
const { requestFn, resolve, reject } = queue.shift();
try {
const result = await requestFn();
resolve(result);
} catch (error) {
reject(error);
}
// Rate limit: max 10 requests per minute
await sleep(6000);
}
processing = false;
}
`
Now all API calls go through one queue. No more invisible concurrent load.
Fix 5: Monitor and Alert
Add visibility into API usage:
`javascript
const metrics = {
requests: 0,
successes: 0,
failures: 0,
rateLimits: 0
};
async function trackRequest(requestFn) { metrics.requests++; try { const result = await requestFn(); metrics.successes++; return result; } catch (error) { if (error instanceof RateLimitError) { metrics.rateLimits++; } metrics.failures++; throw error; } }
// Log metrics every 5 minutes
setInterval(() => {
console.log('API Metrics:', metrics);
if (metrics.rateLimits > 5) {
alert('High rate limit frequency — check concurrent usage');
}
}, 300000);
`
The Lesson: Never Assume JSON
The core lesson is simple but often ignored:
APIs can return anything. Your code must handle everything.
Specifically:
| Response Type | Your Code Should | |---------------|------------------| | 200 + JSON | Parse and use normally | | 200 + HTML | Detect, log, error gracefully | | 429 | Backoff and retry | | 500-503 | Log server issue, retry with backoff | | Timeout | Retry with longer timeout | | Network error | Log, alert, maybe retry | | Empty body | Handle as error | | Malformed JSON | Catch parse error, log raw response |
The test suite that says "API returns expected data" isn't enough. You need tests for: - What if the API is down? - What if we're rate limited? - What if CloudFlare intercepts? - What if the response is garbage?
The Post-Fix Pipeline
After implementing these fixes:
- 1.Status check first — Errors caught before parsing
- 2.Content-type verification — HTML detected immediately
- 3.Exponential backoff — Rate limits handled gracefully
- 4.Request queue — No more concurrent stampedes
- 5.Metrics and alerts — Visibility into API health
The content engine hasn't failed due to HTML responses since.
FAQ
Why does Perplexity return HTML for errors?
Most API providers use infrastructure layers (Cloudflare, Nginx, AWS) that handle rate limiting before requests reach the application. These layers return standard HTTP error pages, not JSON.
It's not great design, but it's extremely common. Your code needs to handle it.
How do you prevent rate limiting?
- 1.Respect rate limits — Check headers, honor retry-after
- 2.Use queues — Serialize requests through one channel
- 3.Monitor usage — Know how many requests you're making
- 4.Coordinate processes — Don't have multiple things hitting the same API unknowingly
- 5.Cache responses — Don't re-request data you already have
What about other APIs?
Same patterns apply. Always: - Check status code before parsing - Verify content-type - Implement backoff and retry - Centralize calls through a queue
OpenAI, Anthropic, Google — they all have rate limits. They all can return unexpected responses. The code that calls them should be resilient.
Should you retry 500 errors?
Yes, with backoff. 500 errors are often transient — the server hiccuped, try again. But don't retry infinitely, and increase delay between attempts.
What's the difference between 429 and 503?
- 429: Rate limited. You sent too many requests. Wait and try again.
- 503: Service unavailable. Server is down or overloaded. Not your fault, but also wait and try again.
Both mean "try later." Both might return HTML. Both need handling.
Related Tales
- Why I Keep Pointing at the Wrong Database — API configuration gone wrong
- The next.config.ts Fuckup — Another "works locally, breaks in production" story
- When Vercel Stopped Listening to GitHub — Infrastructure misconfiguration
NARF! 🐀
JSON expected, HTML received, lesson learned.

