User clicks "Generate My Quote" on ShoreAgents. The pricing engine needs to: 1. Parse their input with Claude AI 2. Search current salary data for each role 3. Calculate government contributions (SSS, PhilHealth, Pag-IBIG) 4. Apply tier multipliers 5. Factor in workspace costs 6. Generate a formatted breakdown
Total time: 8-15 seconds depending on complexity.
The original implementation? Synchronous. User clicks, browser spins for 15 seconds, Vercel times out at 10 seconds, user sees error, user clicks again, now TWO quotes are generating, server is sweating.
This is why background jobs exist.
The Synchronous Trap
When I first audited ShoreAgents, I found this pattern everywhere:
`typescript
// Original quote generation - synchronous nightmare
app.post('/api/quote', async (req, res) => {
// All of this happens while the user waits
const parsed = await parseInputWithAI(req.body.input); // 2-3s
const salaries = await searchSalaryData(parsed.roles); // 3-4s
const costs = await calculateAllCosts(salaries); // 1-2s
const breakdown = await formatBreakdown(costs); // 1s
// Finally respond after 8-10 seconds
res.json({ quote: breakdown });
});
`
Problems: - Vercel functions timeout at 10 seconds (default) - User might click again, spawning duplicate work - If any step fails, entire request fails - No visibility into progress - No retry on partial failure
The Background Job Pattern
Here's how I restructured it:
`typescript
// Step 1: Accept request, return immediately
app.post('/api/quote', async (req, res) => {
// Validate input
const validation = validateQuoteInput(req.body);
if (!validation.ok) {
return res.status(400).json({ error: validation.error });
}
// Create job record const job = await db.jobs.create({ type: 'generate_quote', input: req.body, status: 'pending', created_at: new Date() });
// Queue for processing await queue.add('quote', { jobId: job.id });
// Return immediately with job ID res.json({ jobId: job.id, status: 'pending', checkUrl: '/api/quote/' + job.id }); });
// Step 2: Process in background queue.process('quote', async (task) => { const job = await db.jobs.get(task.data.jobId); try { await db.jobs.update(job.id, { status: 'processing' }); // Do the actual work const parsed = await parseInputWithAI(job.input); const salaries = await searchSalaryData(parsed.roles); const costs = await calculateAllCosts(salaries); const breakdown = await formatBreakdown(costs); // Save result await db.jobs.update(job.id, { status: 'completed', result: breakdown, completed_at: new Date() }); } catch (error) { await db.jobs.update(job.id, { status: 'failed', error: error.message, failed_at: new Date() }); } });
// Step 3: Client polls for result
app.get('/api/quote/:jobId', async (req, res) => {
const job = await db.jobs.get(req.params.jobId);
res.json({
status: job.status,
result: job.result,
error: job.error
});
});
`
User experience now: 1. Click "Generate Quote" 2. Instantly see "Generating your quote..." with spinner 3. Frontend polls every 2 seconds 4. Quote appears when ready (typically 8-10 seconds) 5. No timeout errors, no duplicates
When to Use Background Jobs
Not everything needs to be async. Here's my decision framework:
Use background jobs when: - Operation takes more than 2-3 seconds - Operation can fail and needs retry - Operation involves external services that might be slow - User doesn't need immediate result - Operation should continue even if user leaves page
Keep synchronous when: - Operation is fast (<1 second) - User needs immediate feedback (authentication, validation) - Failure means the whole request should fail - No retry semantics make sense
ShoreAgents examples:
| Operation | Sync/Async | Why | |-----------|------------|-----| | User login | Sync | Need immediate yes/no | | Quote generation | Async | 8-15 seconds, can retry | | Lead capture | Sync | Simple DB insert, fast | | Email sending | Async | External service, can retry | | PDF report | Async | Heavy generation, can poll | | Candidate search | Sync | Should be fast with good indexes |
Idempotency: The Critical Rule
Jobs will retry. Network blips happen. Workers crash. When a job runs twice, it should produce the same result - not duplicate data.
Bad (not idempotent):
`typescript
// If this runs twice, user gets charged twice
async function processPayment(orderId) {
const order = await db.orders.get(orderId);
await stripe.charges.create({
amount: order.total,
customer: order.customerId
});
await db.orders.update(orderId, { status: 'paid' });
}
`
Good (idempotent):
`typescript
// Check if already processed before doing anything
async function processPayment(orderId) {
const order = await db.orders.get(orderId);
// Already paid? Skip.
if (order.status === 'paid') {
return { skipped: true, reason: 'already_paid' };
}
// Use idempotency key with Stripe
await stripe.charges.create({
amount: order.total,
customer: order.customerId,
idempotency_key: 'order_' + orderId
});
await db.orders.update(orderId, { status: 'paid' });
}
`
For quote generation, idempotency means caching:
`typescript
// Same input = return cached result
const cacheKey = hashInput(job.input);
const cached = await cache.get(cacheKey);
if (cached) {
return cached;
}
// ... generate quote ...
await cache.set(cacheKey, result, { ttl: '1h' });
`
Our Queue Setup
For ShoreAgents, we use a simple Postgres-backed queue (no Redis needed for our scale):
`typescript
// jobs table in Supabase
create table jobs (
id uuid primary key default gen_random_uuid(),
type text not null,
input jsonb not null,
status text default 'pending',
result jsonb,
error text,
attempts int default 0,
max_attempts int default 3,
created_at timestamptz default now(),
started_at timestamptz,
completed_at timestamptz,
run_after timestamptz default now()
);
-- Index for polling
create index jobs_pending on jobs(type, status, run_after)
where status in ('pending', 'retry');
`
Worker polls every second:
`typescript
while (true) {
const job = await db.query(`
UPDATE jobs SET
status = 'processing',
started_at = now(),
attempts = attempts + 1
WHERE id = (
SELECT id FROM jobs
WHERE status IN ('pending', 'retry')
AND run_after <= now()
ORDER BY created_at
LIMIT 1
FOR UPDATE SKIP LOCKED
)
RETURNING *
`);
if (job) {
await processJob(job);
} else {
await sleep(1000);
}
}
`
The "FOR UPDATE SKIP LOCKED" is key - multiple workers can run without grabbing the same job.
Failure Handling
When jobs fail, we don't just give up:
`typescript
async function processJob(job) {
try {
const result = await handlersjob.type;
await markCompleted(job.id, result);
} catch (error) {
if (job.attempts >= job.max_attempts) {
// Send to dead letter queue
await markFailed(job.id, error);
await alertOnFailure(job, error);
} else {
// Schedule retry with exponential backoff
const delay = Math.pow(2, job.attempts) * 1000; // 2s, 4s, 8s
await scheduleRetry(job.id, delay);
}
}
}
`
Dead letter queue: Jobs that fail all retries go to a separate table for manual investigation. I get a Telegram alert when this happens.
The Maya Chat Queue
Maya (our AI salesperson) uses this pattern extensively. Every tool Maya calls is a background job:
- generate_quote -> quote_queue
- search_candidates -> search_queue
- send_candidate_matches -> email_queue
- capture_lead -> crm_queue
Why? Because if Claude decides to call generate_quote and the pricing engine is slow, we don't want the chat to hang. Maya queues the job, tells the user "Generating your quote now...", and continues the conversation while it processes.
Visibility: The Dashboard
A queue without visibility is a black hole. We built a simple admin view:
`sql
-- Queue health at a glance
SELECT
type,
COUNT(*) FILTER (WHERE status = 'pending') as pending,
COUNT(*) FILTER (WHERE status = 'processing') as processing,
COUNT(*) FILTER (WHERE status = 'completed') as completed_24h,
COUNT(*) FILTER (WHERE status = 'failed') as failed_24h,
AVG(EXTRACT(EPOCH FROM (completed_at - created_at)))
FILTER (WHERE status = 'completed') as avg_duration_seconds
FROM jobs
WHERE created_at > now() - interval '24 hours'
GROUP BY type;
`
If pending count grows faster than completed count, we're falling behind. If failed count spikes, something's broken. If avg_duration increases, we're getting slower.
This data goes to a simple dashboard that Stephen can check anytime.
Key Takeaways
After rebuilding ShoreAgents' async processing:
1. If it takes more than 2 seconds, make it async. Return immediately with a job ID.
2. Idempotency is non-negotiable. Jobs will retry. Handle it gracefully.
3. Dead letter queues save debugging time. Failed jobs need investigation, not deletion.
4. Visibility prevents surprises. Know your queue depth, failure rate, and processing time.
5. Keep it simple. Postgres queues work fine for most use cases. Don't reach for Redis/RabbitMQ until you need them.
If a user action might take more than 2 seconds, it's a background job. Return immediately with a job ID and let them check status. They'll thank you for not showing them a loading spinner for 15 seconds.

