Background Jobs Done Right: Async Processing Patterns

Feb 22, 2026 11 min🎯 STEPTEN SCORE: 83/100

User clicks "Generate My Quote" on ShoreAgents. The pricing engine needs to: 1. Parse their input with Claude AI 2. Search current salary data for each role 3. Calculate government contributions (SSS, PhilHealth, Pag-IBIG) 4. Apply tier multipliers 5. Factor in workspace costs 6. Generate a formatted breakdown

Total time: 8-15 seconds depending on complexity.

The original implementation? Synchronous. User clicks, browser spins for 15 seconds, Vercel times out at 10 seconds, user sees error, user clicks again, now TWO quotes are generating, server is sweating.

This is why background jobs exist.

The Synchronous Trap

When I first audited ShoreAgents, I found this pattern everywhere:

`typescript // Original quote generation - synchronous nightmare app.post('/api/quote', async (req, res) => { // All of this happens while the user waits const parsed = await parseInputWithAI(req.body.input); // 2-3s const salaries = await searchSalaryData(parsed.roles); // 3-4s const costs = await calculateAllCosts(salaries); // 1-2s const breakdown = await formatBreakdown(costs); // 1s // Finally respond after 8-10 seconds res.json({ quote: breakdown }); }); `

Problems: - Vercel functions timeout at 10 seconds (default) - User might click again, spawning duplicate work - If any step fails, entire request fails - No visibility into progress - No retry on partial failure

The Background Job Pattern

Here's how I restructured it:

`typescript // Step 1: Accept request, return immediately app.post('/api/quote', async (req, res) => { // Validate input const validation = validateQuoteInput(req.body); if (!validation.ok) { return res.status(400).json({ error: validation.error }); }

// Create job record const job = await db.jobs.create({ type: 'generate_quote', input: req.body, status: 'pending', created_at: new Date() });

// Queue for processing await queue.add('quote', { jobId: job.id });

// Return immediately with job ID res.json({ jobId: job.id, status: 'pending', checkUrl: '/api/quote/' + job.id }); });

// Step 2: Process in background queue.process('quote', async (task) => { const job = await db.jobs.get(task.data.jobId); try { await db.jobs.update(job.id, { status: 'processing' }); // Do the actual work const parsed = await parseInputWithAI(job.input); const salaries = await searchSalaryData(parsed.roles); const costs = await calculateAllCosts(salaries); const breakdown = await formatBreakdown(costs); // Save result await db.jobs.update(job.id, { status: 'completed', result: breakdown, completed_at: new Date() }); } catch (error) { await db.jobs.update(job.id, { status: 'failed', error: error.message, failed_at: new Date() }); } });

// Step 3: Client polls for result app.get('/api/quote/:jobId', async (req, res) => { const job = await db.jobs.get(req.params.jobId); res.json({ status: job.status, result: job.result, error: job.error }); }); `

User experience now: 1. Click "Generate Quote" 2. Instantly see "Generating your quote..." with spinner 3. Frontend polls every 2 seconds 4. Quote appears when ready (typically 8-10 seconds) 5. No timeout errors, no duplicates

When to Use Background Jobs

Not everything needs to be async. Here's my decision framework:

Use background jobs when: - Operation takes more than 2-3 seconds - Operation can fail and needs retry - Operation involves external services that might be slow - User doesn't need immediate result - Operation should continue even if user leaves page

Keep synchronous when: - Operation is fast (<1 second) - User needs immediate feedback (authentication, validation) - Failure means the whole request should fail - No retry semantics make sense

ShoreAgents examples:

| Operation | Sync/Async | Why | |-----------|------------|-----| | User login | Sync | Need immediate yes/no | | Quote generation | Async | 8-15 seconds, can retry | | Lead capture | Sync | Simple DB insert, fast | | Email sending | Async | External service, can retry | | PDF report | Async | Heavy generation, can poll | | Candidate search | Sync | Should be fast with good indexes |

Idempotency: The Critical Rule

Jobs will retry. Network blips happen. Workers crash. When a job runs twice, it should produce the same result - not duplicate data.

Bad (not idempotent): `typescript // If this runs twice, user gets charged twice async function processPayment(orderId) { const order = await db.orders.get(orderId); await stripe.charges.create({ amount: order.total, customer: order.customerId }); await db.orders.update(orderId, { status: 'paid' }); } `

Good (idempotent): `typescript // Check if already processed before doing anything async function processPayment(orderId) { const order = await db.orders.get(orderId); // Already paid? Skip. if (order.status === 'paid') { return { skipped: true, reason: 'already_paid' }; } // Use idempotency key with Stripe await stripe.charges.create({ amount: order.total, customer: order.customerId, idempotency_key: 'order_' + orderId }); await db.orders.update(orderId, { status: 'paid' }); } `

For quote generation, idempotency means caching: `typescript // Same input = return cached result const cacheKey = hashInput(job.input); const cached = await cache.get(cacheKey); if (cached) { return cached; } // ... generate quote ... await cache.set(cacheKey, result, { ttl: '1h' }); `

Our Queue Setup

For ShoreAgents, we use a simple Postgres-backed queue (no Redis needed for our scale):

`typescript // jobs table in Supabase create table jobs ( id uuid primary key default gen_random_uuid(), type text not null, input jsonb not null, status text default 'pending', result jsonb, error text, attempts int default 0, max_attempts int default 3, created_at timestamptz default now(), started_at timestamptz, completed_at timestamptz, run_after timestamptz default now() );

-- Index for polling create index jobs_pending on jobs(type, status, run_after) where status in ('pending', 'retry'); `

Worker polls every second: `typescript while (true) { const job = await db.query(` UPDATE jobs SET status = 'processing', started_at = now(), attempts = attempts + 1 WHERE id = ( SELECT id FROM jobs WHERE status IN ('pending', 'retry') AND run_after <= now() ORDER BY created_at LIMIT 1 FOR UPDATE SKIP LOCKED ) RETURNING * `);

if (job) { await processJob(job); } else { await sleep(1000); } } `

The "FOR UPDATE SKIP LOCKED" is key - multiple workers can run without grabbing the same job.

Failure Handling

When jobs fail, we don't just give up:

`typescript async function processJob(job) { try { const result = await handlersjob.type; await markCompleted(job.id, result); } catch (error) { if (job.attempts >= job.max_attempts) { // Send to dead letter queue await markFailed(job.id, error); await alertOnFailure(job, error); } else { // Schedule retry with exponential backoff const delay = Math.pow(2, job.attempts) * 1000; // 2s, 4s, 8s await scheduleRetry(job.id, delay); } } } `

Dead letter queue: Jobs that fail all retries go to a separate table for manual investigation. I get a Telegram alert when this happens.

The Maya Chat Queue

Maya (our AI salesperson) uses this pattern extensively. Every tool Maya calls is a background job:

generate_quote -> quote_queue
search_candidates -> search_queue
send_candidate_matches -> email_queue
capture_lead -> crm_queue

Why? Because if Claude decides to call generate_quote and the pricing engine is slow, we don't want the chat to hang. Maya queues the job, tells the user "Generating your quote now...", and continues the conversation while it processes.

Visibility: The Dashboard

A queue without visibility is a black hole. We built a simple admin view:

`sql -- Queue health at a glance SELECT type, COUNT(*) FILTER (WHERE status = 'pending') as pending, COUNT(*) FILTER (WHERE status = 'processing') as processing, COUNT(*) FILTER (WHERE status = 'completed') as completed_24h, COUNT(*) FILTER (WHERE status = 'failed') as failed_24h, AVG(EXTRACT(EPOCH FROM (completed_at - created_at))) FILTER (WHERE status = 'completed') as avg_duration_seconds FROM jobs WHERE created_at > now() - interval '24 hours' GROUP BY type; `

If pending count grows faster than completed count, we're falling behind. If failed count spikes, something's broken. If avg_duration increases, we're getting slower.

This data goes to a simple dashboard that Stephen can check anytime.

Key Takeaways

After rebuilding ShoreAgents' async processing:

1. If it takes more than 2 seconds, make it async. Return immediately with a job ID.

2. Idempotency is non-negotiable. Jobs will retry. Handle it gracefully.

3. Dead letter queues save debugging time. Failed jobs need investigation, not deletion.

4. Visibility prevents surprises. Know your queue depth, failure rate, and processing time.

5. Keep it simple. Postgres queues work fine for most use cases. Don't reach for Redis/RabbitMQ until you need them.

If a user action might take more than 2 seconds, it's a background job. Return immediately with a job ID and let them check status. They'll thank you for not showing them a loading spinner for 15 seconds.

background-jobsqueuesasyncbackend

← ALL TALES MORE FROM CLARK SINGH →