Skip to content

Backup & Recovery Strategies

Your database is only as reliable as your ability to restore it. A backup you have never tested is not a backup - it is a hope. This guide covers the tools, techniques, and strategies for protecting MySQL and PostgreSQL data against every failure mode from accidental DELETE to total disk loss.


Logical vs Physical Backups

Comparison of logical and physical backup strategies with RTO and RPO timeline

Every backup method falls into one of two categories, and understanding the trade-offs determines which tools you reach for.

A logical backup exports data as SQL statements or delimited text. You get a portable, human-readable file that can be loaded into any compatible database version. The trade-off is speed - logical backups read every row through the SQL layer, and restoring means re-executing every INSERT. For a 500 GB database, that can take hours or days.

A physical backup copies the raw data files (tablespace files, WAL segments, redo logs) at the filesystem or storage level. These are fast to create and fast to restore because they skip the SQL layer entirely. The trade-off is portability - a physical backup is tied to the exact database version, architecture, and often the same configuration.

Factor Logical Physical
Speed (backup) Slow - reads through SQL Fast - copies raw files
Speed (restore) Slow - re-executes SQL Fast - copies files back
Size Smaller (text, compresses well) Larger (raw pages, internal fragmentation)
Portability High - works across versions and platforms Low - same version and architecture
Granularity Per-table, per-database, or full Usually full instance (some tools support incremental)
Consistency Snapshot via transactions or locking Requires coordination (flush, freeze, or hot backup tool)
PITR support No (without binary log/WAL) Yes (with log replay)

When to use each:

  • Logical: migrating between versions, exporting specific tables, small-to-medium databases, cross-platform transfers
  • Physical: large databases where restore speed matters, disaster recovery, setting up replicas, when you need point-in-time recovery

Most production environments use both - nightly physical backups for fast disaster recovery, plus periodic logical backups for portability and schema versioning.


MySQL Backup Tools

mysqldump

mysqldump is the standard logical backup tool shipped with MySQL. It produces SQL output that recreates the schema and data.

A basic full backup:

mysqldump --single-transaction --routines --triggers --events \
  --all-databases > full_backup.sql

The flags that matter:

Flag Purpose
--single-transaction Takes a consistent snapshot using a transaction (InnoDB only, no table locks)
--routines Includes stored procedures and functions
--triggers Includes triggers (on by default in MySQL 5.7+, but be explicit)
--events Includes scheduled events
--all-databases Dumps every database
--flush-logs Rotates binary logs after the dump, useful for PITR
--source-data=2 Records the binary log position as a comment (MySQL 8.0+, replaces --master-data)
--set-gtid-purged=OFF Prevents GTID-related errors when restoring to a non-GTID server

MyISAM and --single-transaction

--single-transaction only provides consistency for InnoDB tables. If your database contains MyISAM tables, you need --lock-all-tables instead, which blocks all writes during the dump. On modern MySQL, you should be using InnoDB for everything.

Restoring a mysqldump backup:

mysql < full_backup.sql

For large restores, disable checks to speed up the import:

mysql -e "SET GLOBAL innodb_flush_log_at_trx_commit=2; SET GLOBAL sync_binlog=0;"
mysql < full_backup.sql
mysql -e "SET GLOBAL innodb_flush_log_at_trx_commit=1; SET GLOBAL sync_binlog=1;"

mydumper and myloader

mydumper is a third-party tool that parallelizes logical backups. Where mysqldump uses a single thread, mydumper can dump multiple tables simultaneously and splits large tables into chunks. The companion tool myloader restores in parallel.

# Parallel backup with 4 threads
mydumper --threads 4 --compress --outputdir /backup/mysql/daily

# Parallel restore
myloader --threads 4 --directory /backup/mysql/daily

For a 100 GB database, mydumper/myloader can be 4-8x faster than mysqldump/mysql depending on hardware and table count.

mysqlpump

mysqlpump is MySQL's built-in parallel dump tool (introduced in 5.7). It supports parallel processing and progress reporting but has seen less adoption than mydumper and was deprecated in MySQL 8.0.3 in favor of MySQL Shell's dump utilities.

mysqlpump --default-parallelism=4 --all-databases > backup.sql

PostgreSQL Backup Tools

pg_dump

pg_dump is the standard logical backup tool for PostgreSQL. Unlike mysqldump, it supports multiple output formats.

Format Flag Description
Plain SQL -Fp SQL text output (default), restored with psql
Custom -Fc Compressed, supports selective restore, restored with pg_restore
Directory -Fd One file per table, supports parallel dump and restore
Tar -Ft Tar archive, restored with pg_restore

The custom format (-Fc) is the best default choice - it compresses automatically, supports selective table restore, and allows parallel restore with pg_restore -j.

# Custom format (recommended)
pg_dump -Fc mydb > mydb.dump

# Directory format with parallel dump (4 jobs)
pg_dump -Fd -j 4 mydb -f /backup/mydb_dir

# Plain SQL (human-readable)
pg_dump -Fp mydb > mydb.sql

Restoring:

# From custom or tar format
pg_restore -d mydb mydb.dump

# Parallel restore (4 jobs, custom or directory format only)
pg_restore -d mydb -j 4 mydb.dump

# From plain SQL
psql -d mydb -f mydb.sql

# Restore specific tables only
pg_restore -d mydb -t orders -t customers mydb.dump

pg_dumpall

pg_dumpall dumps the entire cluster including roles, tablespaces, and other global objects that pg_dump does not capture.

# Full cluster dump (roles + all databases)
pg_dumpall > cluster_backup.sql

# Globals only (roles and tablespaces, no data)
pg_dumpall --globals-only > globals.sql

A common pattern is to use pg_dumpall --globals-only for roles and pg_dump -Fc per database for data. This gives you the flexibility of custom format for data while still capturing global objects.

pg_basebackup

pg_basebackup creates a physical backup by streaming a copy of the entire PostgreSQL data directory. This is the foundation for setting up streaming replicas and for physical backup strategies.

# Stream a physical backup to a directory
pg_basebackup -D /backup/pg_base -Fp -Xs -P -R

# Create a compressed tar backup
pg_basebackup -D /backup/pg_base -Ft -Xf -z
Flag Purpose
-D Destination directory
-Fp Plain format (data directory copy)
-Ft Tar format
-Xs Stream WAL during backup (recommended)
-Xf Fetch WAL after backup completes
-z Compress output (tar format only)
-P Show progress
-R Write standby.signal and connection settings for replica setup

Percona XtraBackup

Percona XtraBackup creates hot physical backups of InnoDB without locking tables or interrupting transactions. It copies the InnoDB data files while tracking the redo log to capture changes made during the copy. This makes it the standard tool for MySQL physical backups in production.

Full Backup

# Create a full backup
xtrabackup --backup --target-dir=/backup/base

# Prepare the backup (apply redo log for consistency)
xtrabackup --prepare --target-dir=/backup/base

# Restore (MySQL must be stopped)
systemctl stop mysql
rm -rf /var/lib/mysql/*
xtrabackup --copy-back --target-dir=/backup/base
chown -R mysql:mysql /var/lib/mysql
systemctl start mysql

The prepare step is critical. A raw XtraBackup is not consistent - it contains data files from different points in time plus redo log entries. The prepare step replays the redo log to bring all data files to a consistent state, just like InnoDB crash recovery.

Incremental Backups

XtraBackup supports incremental backups that only copy pages changed since the last backup. This dramatically reduces backup time and storage for large databases.

# Full base backup (Sunday)
xtrabackup --backup --target-dir=/backup/base

# Incremental backup (Monday) - only changed pages
xtrabackup --backup --target-dir=/backup/inc1 \
  --incremental-basedir=/backup/base

# Incremental backup (Tuesday)
xtrabackup --backup --target-dir=/backup/inc2 \
  --incremental-basedir=/backup/inc1

Restoring an incremental chain requires preparing each increment in order:

# Prepare base (with --apply-log-only to preserve uncommitted transactions)
xtrabackup --prepare --apply-log-only --target-dir=/backup/base

# Apply incremental backups in order
xtrabackup --prepare --apply-log-only --target-dir=/backup/base \
  --incremental-dir=/backup/inc1

xtrabackup --prepare --apply-log-only --target-dir=/backup/base \
  --incremental-dir=/backup/inc2

# Final prepare (without --apply-log-only)
xtrabackup --prepare --target-dir=/backup/base

# Restore
systemctl stop mysql
rm -rf /var/lib/mysql/*
xtrabackup --copy-back --target-dir=/backup/base
chown -R mysql:mysql /var/lib/mysql
systemctl start mysql

The --apply-log-only flag

Use --apply-log-only during intermediate prepare steps. Without it, XtraBackup rolls back uncommitted transactions, which makes subsequent incremental applies fail. Only omit --apply-log-only on the final prepare step.


Point-in-Time Recovery (PITR)

Backups give you a snapshot. Point-in-time recovery fills the gap between the snapshot and the moment of failure (or the moment before an accidental change). PITR replays transaction logs from the backup's position forward to a specific timestamp or log position.

flowchart LR
    FB[Full Backup<br/>Sunday 02:00] --> WAL1[WAL Segment 1]
    WAL1 --> WAL2[WAL Segment 2]
    WAL2 --> WAL3[WAL Segment 3]
    WAL3 --> F[Failure<br/>Wednesday 14:30]

    FB --> R[Restore Base Backup]
    R --> RP[Replay WAL/Binlog<br/>to Target Time]
    RP --> REC[Recovered Database<br/>Wednesday 14:29]

MySQL: Binary Log Replay

MySQL's binary log records every data-modifying statement or row change. PITR replays these logs with mysqlbinlog.

Step 1: Restore from the most recent full backup.

mysql < /backup/mysql_full_20260218.sql

Step 2: Identify the binary logs created after the backup.

ls -la /var/lib/mysql/binlog.*

Step 3: Replay binary logs up to the target time (just before the accidental operation).

mysqlbinlog --stop-datetime="2026-02-19 14:30:00" \
  /var/lib/mysql/binlog.000042 \
  /var/lib/mysql/binlog.000043 | mysql

To skip a specific bad transaction, use --start-position and --stop-position to replay around it:

# Replay everything up to the bad statement
mysqlbinlog --stop-position=12345 /var/lib/mysql/binlog.000042 | mysql

# Skip the bad statement and continue from after it
mysqlbinlog --start-position=12400 /var/lib/mysql/binlog.000042 | mysql

Binary logging must be enabled BEFORE disaster

PITR only works if log_bin is enabled and binary logs exist from the backup point forward. Verify with SHOW VARIABLES LIKE 'log_bin'. If binary logging was not enabled, you can only restore to the backup snapshot - everything after it is lost.

PostgreSQL: WAL Archiving and Recovery

PostgreSQL's equivalent is Write-Ahead Log (WAL) archiving. Every change is written to WAL segments before it reaches the data files. By archiving these segments, you can replay them to recover to any point in time.

Step 1: Configure WAL archiving in postgresql.conf:

wal_level = replica
archive_mode = on
archive_command = 'cp %p /backup/wal_archive/%f'

Step 2: Take a base backup:

pg_basebackup -D /backup/pg_base -Fp -Xs -P

Step 3: When recovery is needed, copy the base backup into position and configure recovery:

# Stop PostgreSQL
systemctl stop postgresql

# Replace the data directory with the backup
rm -rf /var/lib/postgresql/16/main
cp -a /backup/pg_base /var/lib/postgresql/16/main

# Configure recovery target
cat >> /var/lib/postgresql/16/main/postgresql.auto.conf <<EOF
restore_command = 'cp /backup/wal_archive/%f %p'
recovery_target_time = '2026-02-19 14:30:00'
EOF

# Create the recovery signal file
touch /var/lib/postgresql/16/main/recovery.signal

# Start PostgreSQL - it will replay WAL to the target time
chown -R postgres:postgres /var/lib/postgresql/16/main
systemctl start postgresql

PostgreSQL replays archived WAL segments using restore_command until it reaches recovery_target_time, then pauses. You verify the data and promote to normal operation:

SELECT pg_wal_replay_resume();

Backup Verification

A backup that has never been restored is an untested assumption. Schedule regular verification to ensure your backups actually work.

Restore to a Test Instance

The most reliable verification is a full restore to a separate server or container:

# MySQL: restore to a test instance
mysql -h test-db-host < /backup/mysql_full_20260219.sql
mysql -h test-db-host -e "SELECT COUNT(*) FROM mydb.orders"

# PostgreSQL: restore to a test database
createdb -h test-db-host mydb_test
pg_restore -d mydb_test -h test-db-host /backup/pg_myapp_20260219.dump
psql -h test-db-host -d mydb_test -c "SELECT COUNT(*) FROM orders"

Checksum Validation

Verify backup file integrity before and after transfer:

# Generate checksums at backup time
sha256sum /backup/mysql_full_20260219.sql.gz > /backup/mysql_full_20260219.sha256

# Verify after transfer to offsite storage
sha256sum -c /backup/mysql_full_20260219.sha256

For XtraBackup, the --prepare step itself validates consistency. If prepare succeeds, the backup is restorable.

Automated Verification Script

#!/bin/bash
# verify_backup.sh - restore and validate the latest backup

BACKUP_FILE=$(ls -t /backup/mysql_full_*.sql.gz | head -1)
TEST_DB="backup_verify_$(date +%Y%m%d)"

# Create test database and restore
mysql -e "CREATE DATABASE ${TEST_DB}"
zcat "${BACKUP_FILE}" | mysql "${TEST_DB}"

# Run validation queries
TABLES=$(mysql -NBe "SELECT COUNT(*) FROM information_schema.tables WHERE table_schema='${TEST_DB}'")
ROWS=$(mysql -NBe "SELECT SUM(table_rows) FROM information_schema.tables WHERE table_schema='${TEST_DB}'")

echo "Backup: ${BACKUP_FILE}"
echo "Tables: ${TABLES}"
echo "Total rows: ${ROWS}"

# Compare against expected minimums
if [ "${TABLES}" -lt 10 ]; then
  echo "ALERT: Table count below threshold" >&2
  exit 1
fi

# Cleanup
mysql -e "DROP DATABASE ${TEST_DB}"
echo "Verification passed"

Automate verification weekly

Run backup verification on a schedule, not just when you suspect a problem. A cron job that restores to a test instance and checks row counts catches silent backup failures before they become disasters.


Scheduling and Rotation

Cron-Based Backup Scripts

A production backup script handles compression, logging, error reporting, and cleanup:

#!/bin/bash
# mysql_backup.sh - daily MySQL backup with rotation

set -euo pipefail

BACKUP_DIR="/backup/mysql"
DATE=$(date +%Y%m%d_%H%M%S)
LOG_FILE="${BACKUP_DIR}/backup_${DATE}.log"
RETENTION_DAYS=30

exec > >(tee -a "${LOG_FILE}") 2>&1
echo "Backup started: $(date)"

# Full logical backup
mysqldump --single-transaction --routines --triggers --events \
  --all-databases | gzip > "${BACKUP_DIR}/full_${DATE}.sql.gz"

# Generate checksum
sha256sum "${BACKUP_DIR}/full_${DATE}.sql.gz" > "${BACKUP_DIR}/full_${DATE}.sha256"

# Remove backups older than retention period
find "${BACKUP_DIR}" -name "full_*.sql.gz" -mtime +${RETENTION_DAYS} -delete
find "${BACKUP_DIR}" -name "full_*.sha256" -mtime +${RETENTION_DAYS} -delete
find "${BACKUP_DIR}" -name "backup_*.log" -mtime +${RETENTION_DAYS} -delete

echo "Backup completed: $(date)"
echo "Size: $(du -h "${BACKUP_DIR}/full_${DATE}.sql.gz" | cut -f1)"

Schedule with cron:

# Daily at 2:00 AM
0 2 * * * /usr/local/bin/mysql_backup.sh

# Weekly XtraBackup full + daily incrementals
0 3 * * 0 /usr/local/bin/xtrabackup_full.sh
0 3 * * 1-6 /usr/local/bin/xtrabackup_incremental.sh

Grandfather-Father-Son Rotation

The GFS rotation strategy balances retention depth with storage costs:

Level Frequency Retention Purpose
Son (daily) Every day 7-14 days Recent recovery, accidental changes
Father (weekly) Every Sunday 4-5 weeks Weekly recovery points
Grandfather (monthly) First of month 12 months Monthly recovery, compliance
Yearly January 1 3-7 years Regulatory, legal holds

A 500 GB database with GFS rotation:

  • 14 daily backups: 14 x 500 GB = 7 TB (or ~700 GB compressed)
  • 4 weekly: 4 x 500 GB = 2 TB (or ~200 GB compressed)
  • 12 monthly: 12 x 500 GB = 6 TB (or ~600 GB compressed)
  • Total: ~1.5 TB compressed storage for a full year of recovery points

Cloud and Snapshot Backups

Filesystem Snapshots

Filesystem snapshots (LVM, ZFS, EBS) create near-instant point-in-time copies of the underlying storage. They are the fastest backup method because they operate below the filesystem level.

See also

Filesystem-level backups often use LVM snapshots. For LVM concepts, snapshot creation, and filesystem management, see Disk and Filesystem.

LVM snapshots:

# Flush tables and acquire read lock
mysql -e "FLUSH TABLES WITH READ LOCK; SYSTEM lvcreate -L 10G -s -n mysql_snap /dev/vg0/mysql_data"

# Release the lock
mysql -e "UNLOCK TABLES"

# Mount and copy the snapshot
mkdir /mnt/snap
mount /dev/vg0/mysql_snap /mnt/snap
rsync -a /mnt/snap/ /backup/mysql_snapshot/
umount /mnt/snap
lvremove -f /dev/vg0/mysql_snap

ZFS snapshots:

# Flush and snapshot (ZFS snapshots are atomic and instant)
mysql -e "FLUSH TABLES WITH READ LOCK"
zfs snapshot tank/mysql@backup_$(date +%Y%m%d)
mysql -e "UNLOCK TABLES"

# Send snapshot to remote for offsite storage
zfs send tank/mysql@backup_20260219 | ssh backup-host zfs recv backup/mysql

AWS EBS snapshots:

# Flush tables
mysql -e "FLUSH TABLES WITH READ LOCK"

# Create EBS snapshot
aws ec2 create-snapshot --volume-id vol-0123456789abcdef0 \
  --description "MySQL backup $(date +%Y%m%d)"

mysql -e "UNLOCK TABLES"

Consistency during snapshots

Without flushing tables and acquiring a lock (or using xtrabackup), a filesystem snapshot captures data files in an inconsistent state - some pages may be half-written. InnoDB's crash recovery can fix this on startup (similar to recovering from a power failure), but it is safer to flush first. For PostgreSQL, use pg_backup_start() and pg_backup_stop() (or pg_start_backup()/pg_stop_backup() on versions before 15) to ensure WAL consistency.

Consistency Considerations

Database Snapshot Method Consistency Approach
MySQL/InnoDB LVM/ZFS/EBS FLUSH TABLES WITH READ LOCK then snapshot, or use XtraBackup
PostgreSQL LVM/ZFS/EBS pg_backup_start() / pg_backup_stop() + archive WAL
Any Application-level Quiesce writes at application layer before snapshot

Retention Policies

How long you keep backups depends on three factors: recovery requirements, regulatory obligations, and storage budget.

Recovery Requirements

Define your Recovery Point Objective (RPO) and Recovery Time Objective (RTO):

  • RPO: How much data loss is acceptable? An RPO of 1 hour means you need backups or log archiving at least every hour.
  • RTO: How quickly must you restore service? An RTO of 30 minutes rules out logical backups for large databases.

Regulatory Requirements

Some industries mandate specific retention periods:

Regulation Retention Requirement
PCI DSS 1 year minimum for audit logs
HIPAA 6 years for health records
SOX 7 years for financial records
GDPR Only as long as necessary (minimum retention)

GDPR is the outlier - it requires you to delete data you no longer need, which can conflict with long backup retention. Consult your compliance team.

Storage Cost Optimization

Strategies for managing backup storage costs:

  • Tiered storage: keep recent backups on fast storage (SSD/NVMe), move older backups to cheaper storage (HDD, object storage like S3 Glacier)
  • Incremental chains: store one full backup per week and incrementals daily (reduces storage by 60-80%)
  • Compression: gzip reduces logical backups by 80-90%; zstd offers better compression ratios at similar speed
  • Deduplication: tools like borgbackup or ZFS deduplication eliminate redundant blocks across backup versions
# Example: tiered storage with S3
# Recent backups on local disk
/backup/mysql/daily/     # last 7 days

# Weekly backups pushed to S3 Standard
aws s3 cp /backup/mysql/weekly/ s3://mycompany-db-backups/weekly/ --recursive

# Monthly backups moved to S3 Glacier
aws s3 cp /backup/mysql/monthly/ s3://mycompany-db-backups/monthly/ \
  --storage-class GLACIER --recursive

Backup Strategy Design


Further Reading


Previous: Redis | Next: Database Security | Back to Index

Comments