"The API is down." Those words hit different when you're the one who ran the migration.
ShoreAgents. 2am deployment. I thought I was being clever with a migration that renamed a column from "user_email" to "contact_email". What I forgot: the old API code was still running while the new code deployed.
For about 4 minutes, every query referencing "user_email" failed. 4 minutes of 500 errors. 4 minutes of users seeing broken pages.
This is the story of why database migrations are not deployments - and the patterns that prevent 2am incidents.
The Incident: Column Rename Gone Wrong
The migration was simple:
`sql
ALTER TABLE leads RENAME COLUMN user_email TO contact_email;
`
The deployment order: 1. Run migration (column renamed) 2. Deploy new code (uses "contact_email") 3. Old code still running during deploy (uses "user_email")
The gap between steps 1 and 2 was 4 minutes. During those 4 minutes:
`
ERROR: column "user_email" does not exist
ERROR: column "user_email" does not exist
ERROR: column "user_email" does not exist
(400+ more of these)
`
The Safe Pattern: Expand-Contract
The fix is a pattern called expand-contract (or parallel change). Never remove, always add first.
Phase 1: Expand Add the new column alongside the old one.
`sql
-- Migration 1: Add new column
ALTER TABLE leads ADD COLUMN contact_email TEXT;
-- Backfill existing data UPDATE leads SET contact_email = user_email WHERE contact_email IS NULL;
-- Keep them in sync with trigger CREATE OR REPLACE FUNCTION sync_email_columns() RETURNS TRIGGER AS $$ BEGIN IF NEW.user_email IS DISTINCT FROM OLD.user_email THEN NEW.contact_email = NEW.user_email; END IF; IF NEW.contact_email IS DISTINCT FROM OLD.contact_email THEN NEW.user_email = NEW.contact_email; END IF; RETURN NEW; END; $$ LANGUAGE plpgsql;
CREATE TRIGGER leads_sync_email
BEFORE UPDATE ON leads
FOR EACH ROW EXECUTE FUNCTION sync_email_columns();
`
Now both columns exist and stay in sync. Old code works. New code works.
Phase 2: Migrate code Deploy the new code that reads/writes contact_email. Old code still works because user_email is still there and synced.
Phase 3: Contract After all code is using the new column (days or weeks later), remove the old one.
`sql
-- Migration 2: Remove old column (only after code is fully migrated)
DROP TRIGGER leads_sync_email ON leads;
DROP FUNCTION sync_email_columns();
ALTER TABLE leads DROP COLUMN user_email;
`
Total downtime: zero. Total errors: zero.
Common Migration Patterns
Here are the patterns I use for different schema changes:
Renaming a column As above: add new, sync both, deploy code, drop old.
Adding a NOT NULL column Can't add NOT NULL to existing data without a default:
`sql
-- Wrong: fails if table has data
ALTER TABLE users ADD COLUMN verified BOOLEAN NOT NULL;
-- Right: add with default, then optionally remove default
ALTER TABLE users ADD COLUMN verified BOOLEAN NOT NULL DEFAULT false;
-- Later if you want to require explicit values:
ALTER TABLE users ALTER COLUMN verified DROP DEFAULT;
`
Removing a column Remove from code first, then remove from database.
`sql
-- 1. Deploy code that no longer reads/writes legacy_field
-- 2. Wait for all instances to be on new code
-- 3. Then run migration
ALTER TABLE users DROP COLUMN legacy_field;
`
Changing column type Add new column with new type, migrate data, switch code, drop old.
`sql
-- Changing status from TEXT to INTEGER
ALTER TABLE orders ADD COLUMN status_code INTEGER;
UPDATE orders SET status_code = CASE status WHEN 'pending' THEN 1 WHEN 'processing' THEN 2 WHEN 'completed' THEN 3 WHEN 'cancelled' THEN 4 END;
-- Deploy code using status_code
-- Then later:
ALTER TABLE orders DROP COLUMN status;
ALTER TABLE orders RENAME COLUMN status_code TO status;
`
Migration Checklist
Before running any migration, I ask myself:
1. Can I rollback? What happens if this migration fails halfway? Can I undo it? Do I have a rollback script?
2. Will old code still work? If I run this migration and old code is still deployed, will it break? If yes, I need expand-contract.
3. Have I tested on prod-like data? Tested on empty DB? Cool. Now test on a copy of production data. That 1M row table might make your migration take 30 minutes.
4. Is there a maintenance window? For truly breaking changes (rare), schedule downtime. Communicate clearly.
5. What's my monitoring plan? How will I know if something breaks? What queries am I watching? What error rates?
Supabase Specifics
We use Supabase for ShoreAgents and BPOC. Some things I've learned:
Migrations via CLI
`bash
# Generate migration file
supabase migration new add_verified_column
# Edit the generated file
# Then push to remote
supabase db push
`
Check migration status
`bash
supabase migration list
`
Beware of RLS during migrations If you're adding columns, existing RLS policies might need updates. Plan for this.
Large table migrations
Supabase connections have timeouts. For big backfills, batch them:
`sql
-- Instead of one massive UPDATE
UPDATE leads SET contact_email = user_email;
-- Do it in batches
DO $$
DECLARE
batch_size INTEGER := 10000;
affected INTEGER;
BEGIN
LOOP
UPDATE leads
SET contact_email = user_email
WHERE id IN (
SELECT id FROM leads
WHERE contact_email IS NULL
LIMIT batch_size
);
GET DIAGNOSTICS affected = ROW_COUNT;
EXIT WHEN affected = 0;
COMMIT;
END LOOP;
END $$;
`
The Timeline Rule
Here's my rule of thumb for migration timing:
Migration runs: Hours or days before the code deploy Code deploys: Using the new schema alongside the old Cleanup migration: Days or weeks after all code is updated
Example timeline for the email rename:
- Monday 10am: Run expand migration (add contact_email, sync trigger)
- Monday 2pm: Deploy new code (reads/writes contact_email)
- Tuesday: Monitor, ensure no issues
- Friday: Run contract migration (drop user_email)
Plenty of buffer. No 2am emergencies.
Lessons Learned
After the 2am incident:
1. Migrations are not deployments They should run separately, with buffer time between.
2. Always expand first Adding is safe. Removing is dangerous. Add first, remove later.
3. Test with production data volume That migration that took 10ms on your test DB? It might take 10 minutes on prod.
4. Have a rollback plan Before you run UP, know what DOWN looks like.
5. Monitor after migration Watch error rates, slow queries, unexpected nulls. Have dashboards ready.
Migrations are not deployments. They should run hours or days before the code that depends on them. Give yourself a buffer.

