Zero-Downtime Database Migrations at Scale
Back to Home
Database Operations

Zero-Downtime Database Migrations at Scale

F
Futureaiit
Dec 15, 2025
13 min read

Database migrations are terrifying. One wrong move and you have downtime, data loss, or corrupted state. Yet every growing application eventually needs to change its schema: add columns, split tables, change data types. At Futureaiit, we have executed hundreds of zero downtime migrations for companies processing billions of dollars in transactions. Here is how we do it, with battle tested patterns you can apply to your own databases.

Why Migrations Cause Downtime

The naive approach to schema changes is simple but disastrous:

  1. Take the application offline
  2. Run ALTER TABLE statements
  3. Deploy new application code
  4. Bring the application back online

This works for small databases, but fails at scale. An ALTER TABLE on a billion row table can take hours. Your application is offline the entire time. For a SaaS business, this is unacceptable.

Zero downtime migrations require a different approach: make changes incrementally, maintaining backward compatibility at every step.

The Core Principle: Expand and Contract

The expand and contract pattern is the foundation of zero downtime migrations. It works in three phases:

Phase 1: Expand (Add New Schema)

Add the new column, table, or index without removing anything. The old schema and new schema coexist.

Example: You want to rename user_name to full_name.

Step 1: Add the new column:

ALTER TABLE users ADD COLUMN full_name VARCHAR(255);

The application still uses user_name. Nothing breaks.

Phase 2: Migrate (Dual Writes)

Deploy application code that writes to both old and new columns. Read from the old column for now.

// Write to both columns
user.user_name = name;
user.full_name = name;
user.save();

// Read from old column
return user.user_name;

Backfill existing data:

UPDATE users SET full_name = user_name WHERE full_name IS NULL;

Run this in batches to avoid locking the table.

Phase 3: Contract (Remove Old Schema)

Once all data is migrated and the application is writing to both columns, switch reads to the new column:

// Write to both columns (still)
user.user_name = name;
user.full_name = name;
user.save();

// Read from new column
return user.full_name;

After verifying everything works, stop writing to the old column and eventually drop it:

ALTER TABLE users DROP COLUMN user_name;

At no point did the application go offline. Each step was backward compatible.

Pattern 1: Adding a Column

This is the simplest migration. Add the column with a default value or make it nullable.

Safe Approach

ALTER TABLE orders ADD COLUMN status VARCHAR(50) DEFAULT 'pending';

Deploy application code that uses the new column. Old code ignores it, new code writes to it.

Unsafe Approach

ALTER TABLE orders ADD COLUMN status VARCHAR(50) NOT NULL;

This fails if existing rows cannot satisfy the NOT NULL constraint. Always add columns as nullable or with defaults first, then add constraints later.

Pattern 2: Removing a Column

Never drop a column immediately. Follow the expand and contract pattern.

Step 1: Stop Writing

Deploy code that no longer writes to the column. Verify the column is unused.

Step 2: Monitor

Wait at least one deployment cycle (days or weeks) to ensure no old code is still running.

Step 3: Drop

ALTER TABLE users DROP COLUMN deprecated_field;

If you discover you still need the column, you can roll back before this step.

Pattern 3: Changing a Column Type

Changing a column type (e.g., VARCHAR(50) to VARCHAR(255), or INT to BIGINT) can lock the table for hours on large datasets.

Safe Approach: Add a New Column

  1. Add a new column with the desired type
  2. Dual write to both columns
  3. Backfill data from old to new column
  4. Switch reads to the new column
  5. Drop the old column

Example: Changing user_id from INT to BIGINT:

-- Step 1: Add new column
ALTER TABLE orders ADD COLUMN user_id_bigint BIGINT;

-- Step 2: Dual write (in application code)
order.user_id = user_id;
order.user_id_bigint = user_id;

-- Step 3: Backfill
UPDATE orders SET user_id_bigint = user_id WHERE user_id_bigint IS NULL;

-- Step 4: Switch reads to user_id_bigint

-- Step 5: Drop old column
ALTER TABLE orders DROP COLUMN user_id;
ALTER TABLE orders RENAME COLUMN user_id_bigint TO user_id;

Pattern 4: Splitting a Table

You have a monolithic users table and want to split it into users and user_profiles.

Step 1: Create the New Table

CREATE TABLE user_profiles (
    user_id BIGINT PRIMARY KEY,
    bio TEXT,
    avatar_url VARCHAR(255),
    FOREIGN KEY (user_id) REFERENCES users(id)
);

Step 2: Dual Write

Application code writes to both users and user_profiles.

Step 3: Backfill

INSERT INTO user_profiles (user_id, bio, avatar_url)
SELECT id, bio, avatar_url FROM users
WHERE id NOT IN (SELECT user_id FROM user_profiles);

Step 4: Switch Reads

Application reads from user_profiles instead of users.

Step 5: Drop Old Columns

ALTER TABLE users DROP COLUMN bio, DROP COLUMN avatar_url;

Pattern 5: Adding an Index

Creating an index on a large table can lock it for hours. Use concurrent index creation.

PostgreSQL

CREATE INDEX CONCURRENTLY idx_users_email ON users(email);

This builds the index without locking the table for writes.

MySQL

MySQL 5.6+ supports online DDL:

ALTER TABLE users ADD INDEX idx_users_email (email), ALGORITHM=INPLACE, LOCK=NONE;

Blue Green Deployment for Databases

For extremely risky migrations, use blue green deployment at the database level.

How It Works

  1. Create a replica: Set up a read replica of your production database
  2. Apply migrations: Run migrations on the replica (now the "green" database)
  3. Test thoroughly: Verify the green database works with new application code
  4. Switch traffic: Update connection strings to point to the green database
  5. Monitor: Watch for errors, ready to switch back to blue if needed

This requires careful planning for data consistency during the switch, but it gives you an instant rollback path.

Common Pitfalls

1. Not Testing on Production Scale Data

A migration that takes 10 seconds on your development database with 1000 rows might take 10 hours on production with 100 million rows. Always test on a production sized dataset.

2. Forgetting About Indexes

Adding a NOT NULL constraint or foreign key can trigger a full table scan to validate existing data. This locks the table. Add constraints without validation first, then validate in a separate step.

3. Ignoring Replication Lag

If you use read replicas, schema changes can cause replication lag. Monitor lag during migrations and slow down if necessary.

4. No Rollback Plan

Every migration should have a rollback plan. What if the new schema causes performance issues? Can you revert without data loss?

How Futureaiit Can Help

At Futureaiit, we specialize in high risk database migrations. We can help you:

  • Plan zero downtime migrations: Design expand and contract strategies for complex schema changes
  • Execute migrations safely: Run migrations with monitoring and rollback plans
  • Optimize database performance: Add indexes, partition tables, and tune queries
  • Migrate between databases: Move from MySQL to PostgreSQL, or on premises to cloud
  • Set up replication and failover: Build resilient database architectures
  • Recover from failed migrations: Fix corrupted schemas and restore data integrity

We have migrated databases with billions of rows and terabytes of data, all without downtime. Our team knows the edge cases and has battle tested playbooks for every scenario.

Conclusion

Zero downtime database migrations are not magic. They require careful planning, incremental changes, and backward compatibility at every step. The expand and contract pattern is your foundation: add new schema, migrate data, switch reads, remove old schema.

At Futureaiit, we have executed hundreds of these migrations. The key is discipline: never skip steps, always test at scale, and have a rollback plan. With the right approach, even the most complex schema changes can be done safely, without a single second of downtime.

Planning a risky database migration? Contact Futureaiit to discuss how we can help you execute it safely and without downtime.

F

Futureaiit

AI & Technology Experts