Backup & Recovery Strategies¶

Intermediate35 minPrereqs: database-fundamentalsdatabases backup

Learning outcomes

Choose between logical and physical backup strategies based on RTO and RPO requirements
Use mysqldump, pg_dump, pg_basebackup, and Percona XtraBackup for production backups
Implement point-in-time recovery with WAL and binary log replay

Your database is only as reliable as your ability to restore it. A backup you have never tested is not a backup - it is a hope. This guide covers the tools, techniques, and strategies for protecting MySQL and PostgreSQL data against every failure mode from accidental DELETE to total disk loss.

Logical vs Physical Backups¶

Comparison of logical and physical backup strategies with RTO and RPO timeline

Every backup method falls into one of two categories, and understanding the trade-offs determines which tools you reach for.

A logical backup exports data as SQL statements or delimited text. You get a portable, human-readable file that can be loaded into any compatible database version. The trade-off is speed - logical backups read every row through the SQL layer, and restoring means re-executing every INSERT. For a 500 GB database, that can take hours or days.

A physical backup copies the raw data files (tablespace files, WAL segments, redo logs) at the filesystem or storage level. These are fast to create and fast to restore because they skip the SQL layer entirely. The trade-off is portability - a physical backup is tied to the exact database version, architecture, and often the same configuration.

Factor	Logical	Physical
Speed (backup)	Slow - reads through SQL	Fast - copies raw files
Speed (restore)	Slow - re-executes SQL	Fast - copies files back
Size	Smaller (text, compresses well)	Larger (raw pages, internal fragmentation)
Portability	High - works across versions and platforms	Low - same version and architecture
Granularity	Per-table, per-database, or full	Usually full instance (some tools support incremental)
Consistency	Snapshot via transactions or locking	Requires coordination (flush, freeze, or hot backup tool)
PITR support	No (without binary log/WAL)	Yes (with log replay)

When to use each:

Logical: migrating between versions, exporting specific tables, small-to-medium databases, cross-platform transfers
Physical: large databases where restore speed matters, disaster recovery, setting up replicas, when you need point-in-time recovery

Most production environments use both - nightly physical backups for fast disaster recovery, plus periodic logical backups for portability and schema versioning.

MySQL Backup Tools¶

mysqldump¶

mysqldump is the standard logical backup tool shipped with MySQL. It produces SQL output that recreates the schema and data.

A basic full backup:

mysqldump --single-transaction --routines --triggers --events \
  --all-databases > full_backup.sql

The flags that matter:

Flag	Purpose
`--single-transaction`	Takes a consistent snapshot using a transaction (InnoDB only, no table locks)
`--routines`	Includes stored procedures and functions
`--triggers`	Includes triggers (on by default in MySQL 5.7+, but be explicit)
`--events`	Includes scheduled events
`--all-databases`	Dumps every database
`--flush-logs`	Rotates binary logs after the dump, useful for PITR
`--source-data=2`	Records the binary log position as a comment (MySQL 8.0+, replaces `--master-data`)
`--set-gtid-purged=OFF`	Prevents GTID-related errors when restoring to a non-GTID server

MyISAM and --single-transaction

--single-transaction only provides consistency for InnoDB tables. If your database contains MyISAM tables, you need --lock-all-tables instead, which blocks all writes during the dump. On modern MySQL, you should be using InnoDB for everything.

Restoring a mysqldump backup:

mysql < full_backup.sql

For large restores, disable checks to speed up the import:

mysql -e "SET GLOBAL innodb_flush_log_at_trx_commit=2; SET GLOBAL sync_binlog=0;"
mysql < full_backup.sql
mysql -e "SET GLOBAL innodb_flush_log_at_trx_commit=1; SET GLOBAL sync_binlog=1;"

mydumper and myloader¶

mydumper is a third-party tool that parallelizes logical backups. Where mysqldump uses a single thread, mydumper can dump multiple tables simultaneously and splits large tables into chunks. The companion tool myloader restores in parallel.

# Parallel backup with 4 threads
mydumper --threads 4 --compress --outputdir /backup/mysql/daily

# Parallel restore
myloader --threads 4 --directory /backup/mysql/daily

For a 100 GB database, mydumper/myloader can be 4-8x faster than mysqldump/mysql depending on hardware and table count.

mysqlpump¶

mysqlpump is MySQL's built-in parallel dump tool (introduced in 5.7). It supports parallel processing and progress reporting but has seen less adoption than mydumper and was deprecated in MySQL 8.0.3 in favor of MySQL Shell's dump utilities.

mysqlpump --default-parallelism=4 --all-databases > backup.sql

PostgreSQL Backup Tools¶

pg_dump¶

pg_dump is the standard logical backup tool for PostgreSQL. Unlike mysqldump, it supports multiple output formats.

Format	Flag	Description
Plain SQL	`-Fp`	SQL text output (default), restored with `psql`
Custom	`-Fc`	Compressed, supports selective restore, restored with `pg_restore`
Directory	`-Fd`	One file per table, supports parallel dump and restore
Tar	`-Ft`	Tar archive, restored with `pg_restore`

The custom format (-Fc) is the best default choice - it compresses automatically, supports selective table restore, and allows parallel restore with pg_restore -j.

# Custom format (recommended)
pg_dump -Fc mydb > mydb.dump

# Directory format with parallel dump (4 jobs)
pg_dump -Fd -j 4 mydb -f /backup/mydb_dir

# Plain SQL (human-readable)
pg_dump -Fp mydb > mydb.sql

Restoring:

# From custom or tar format
pg_restore -d mydb mydb.dump

# Parallel restore (4 jobs, custom or directory format only)
pg_restore -d mydb -j 4 mydb.dump

# From plain SQL
psql -d mydb -f mydb.sql

# Restore specific tables only
pg_restore -d mydb -t orders -t customers mydb.dump

pg_dumpall¶

pg_dumpall dumps the entire cluster including roles, tablespaces, and other global objects that pg_dump does not capture.

# Full cluster dump (roles + all databases)
pg_dumpall > cluster_backup.sql

# Globals only (roles and tablespaces, no data)
pg_dumpall --globals-only > globals.sql

A common pattern is to use pg_dumpall --globals-only for roles and pg_dump -Fc per database for data. This gives you the flexibility of custom format for data while still capturing global objects.

pg_basebackup¶

pg_basebackup creates a physical backup by streaming a copy of the entire PostgreSQL data directory. This is the foundation for setting up streaming replicas and for physical backup strategies.

# Stream a physical backup to a directory
pg_basebackup -D /backup/pg_base -Fp -Xs -P -R

# Create a compressed tar backup
pg_basebackup -D /backup/pg_base -Ft -Xf -z

Flag	Purpose
`-D`	Destination directory
`-Fp`	Plain format (data directory copy)
`-Ft`	Tar format
`-Xs`	Stream WAL during backup (recommended)
`-Xf`	Fetch WAL after backup completes
`-z`	Compress output (tar format only)
`-P`	Show progress
`-R`	Write `standby.signal` and connection settings for replica setup

Percona XtraBackup¶

Percona XtraBackup creates hot physical backups of InnoDB without locking tables or interrupting transactions. It copies the InnoDB data files while tracking the redo log to capture changes made during the copy. This makes it the standard tool for MySQL physical backups in production.

Full Backup¶

# Create a full backup
xtrabackup --backup --target-dir=/backup/base

# Prepare the backup (apply redo log for consistency)
xtrabackup --prepare --target-dir=/backup/base

# Restore (MySQL must be stopped)
systemctl stop mysql
rm -rf /var/lib/mysql/*
xtrabackup --copy-back --target-dir=/backup/base
chown -R mysql:mysql /var/lib/mysql
systemctl start mysql

The prepare step is critical. A raw XtraBackup is not consistent - it contains data files from different points in time plus redo log entries. The prepare step replays the redo log to bring all data files to a consistent state, just like InnoDB crash recovery.

Incremental Backups¶

XtraBackup supports incremental backups that only copy pages changed since the last backup. This dramatically reduces backup time and storage for large databases.

# Full base backup (Sunday)
xtrabackup --backup --target-dir=/backup/base

# Incremental backup (Monday) - only changed pages
xtrabackup --backup --target-dir=/backup/inc1 \
  --incremental-basedir=/backup/base

# Incremental backup (Tuesday)
xtrabackup --backup --target-dir=/backup/inc2 \
  --incremental-basedir=/backup/inc1

Restoring an incremental chain requires preparing each increment in order:

# Prepare base (with --apply-log-only to preserve uncommitted transactions)
xtrabackup --prepare --apply-log-only --target-dir=/backup/base

# Apply incremental backups in order
xtrabackup --prepare --apply-log-only --target-dir=/backup/base \
  --incremental-dir=/backup/inc1

xtrabackup --prepare --apply-log-only --target-dir=/backup/base \
  --incremental-dir=/backup/inc2

# Final prepare (without --apply-log-only)
xtrabackup --prepare --target-dir=/backup/base

# Restore
systemctl stop mysql
rm -rf /var/lib/mysql/*
xtrabackup --copy-back --target-dir=/backup/base
chown -R mysql:mysql /var/lib/mysql
systemctl start mysql

The --apply-log-only flag

Use --apply-log-only during intermediate prepare steps. Without it, XtraBackup rolls back uncommitted transactions, which makes subsequent incremental applies fail. Only omit --apply-log-only on the final prepare step.

Point-in-Time Recovery (PITR)¶

Backups give you a snapshot. Point-in-time recovery fills the gap between the snapshot and the moment of failure (or the moment before an accidental change). PITR replays transaction logs from the backup's position forward to a specific timestamp or log position.

flowchart LR
    FB[Full Backup<br/>Sunday 02:00] --> WAL1[WAL Segment 1]
    WAL1 --> WAL2[WAL Segment 2]
    WAL2 --> WAL3[WAL Segment 3]
    WAL3 --> F[Failure<br/>Wednesday 14:30]

    FB --> R[Restore Base Backup]
    R --> RP[Replay WAL/Binlog<br/>to Target Time]
    RP --> REC[Recovered Database<br/>Wednesday 14:29]

MySQL: Binary Log Replay¶

MySQL's binary log records every data-modifying statement or row change. PITR replays these logs with mysqlbinlog.

Step 1: Restore from the most recent full backup.

mysql < /backup/mysql_full_20260218.sql

Step 2: Identify the binary logs created after the backup.

ls -la /var/lib/mysql/binlog.*

Step 3: Replay binary logs up to the target time (just before the accidental operation).

mysqlbinlog --stop-datetime="2026-02-19 14:30:00" \
  /var/lib/mysql/binlog.000042 \
  /var/lib/mysql/binlog.000043 | mysql

To skip a specific bad transaction, use --start-position and --stop-position to replay around it:

# Replay everything up to the bad statement
mysqlbinlog --stop-position=12345 /var/lib/mysql/binlog.000042 | mysql

# Skip the bad statement and continue from after it
mysqlbinlog --start-position=12400 /var/lib/mysql/binlog.000042 | mysql

Binary logging must be enabled BEFORE disaster

PITR only works if log_bin is enabled and binary logs exist from the backup point forward. Verify with SHOW VARIABLES LIKE 'log_bin'. If binary logging was not enabled, you can only restore to the backup snapshot - everything after it is lost.

PostgreSQL: WAL Archiving and Recovery¶

PostgreSQL's equivalent is Write-Ahead Log (WAL) archiving. Every change is written to WAL segments before it reaches the data files. By archiving these segments, you can replay them to recover to any point in time.

Step 1: Configure WAL archiving in postgresql.conf:

wal_level = replica
archive_mode = on
archive_command = 'cp %p /backup/wal_archive/%f'

Step 2: Take a base backup:

pg_basebackup -D /backup/pg_base -Fp -Xs -P

Step 3: When recovery is needed, copy the base backup into position and configure recovery:

# Stop PostgreSQL
systemctl stop postgresql

# Replace the data directory with the backup
rm -rf /var/lib/postgresql/16/main
cp -a /backup/pg_base /var/lib/postgresql/16/main

# Configure recovery target
cat >> /var/lib/postgresql/16/main/postgresql.auto.conf <<EOF
restore_command = 'cp /backup/wal_archive/%f %p'
recovery_target_time = '2026-02-19 14:30:00'
EOF

# Create the recovery signal file
touch /var/lib/postgresql/16/main/recovery.signal

# Start PostgreSQL - it will replay WAL to the target time
chown -R postgres:postgres /var/lib/postgresql/16/main
systemctl start postgresql

PostgreSQL replays archived WAL segments using restore_command until it reaches recovery_target_time, then pauses. You verify the data and promote to normal operation:

SELECT pg_wal_replay_resume();

You restored a MySQL backup taken at midnight and need to recover to 14:30. Binary logs binlog.000042 through binlog.000045 exist. What is the correct mysqlbinlog command? (requires JavaScript)

Backup Verification¶

A backup that has never been restored is an untested assumption. Schedule regular verification to ensure your backups actually work.

Restore to a Test Instance¶

The most reliable verification is a full restore to a separate server or container:

# MySQL: restore to a test instance
mysql -h test-db-host < /backup/mysql_full_20260219.sql
mysql -h test-db-host -e "SELECT COUNT(*) FROM mydb.orders"

# PostgreSQL: restore to a test database
createdb -h test-db-host mydb_test
pg_restore -d mydb_test -h test-db-host /backup/pg_myapp_20260219.dump
psql -h test-db-host -d mydb_test -c "SELECT COUNT(*) FROM orders"

Checksum Validation¶

Verify backup file integrity before and after transfer:

# Generate checksums at backup time
sha256sum /backup/mysql_full_20260219.sql.gz > /backup/mysql_full_20260219.sha256

# Verify after transfer to offsite storage
sha256sum -c /backup/mysql_full_20260219.sha256

For XtraBackup, the --prepare step itself validates consistency. If prepare succeeds, the backup is restorable.

Automated Verification Script¶

#!/bin/bash
# verify_backup.sh - restore and validate the latest backup

BACKUP_FILE=$(ls -t /backup/mysql_full_*.sql.gz | head -1)
TEST_DB="backup_verify_$(date +%Y%m%d)"

# Create test database and restore
mysql -e "CREATE DATABASE ${TEST_DB}"
zcat "${BACKUP_FILE}" | mysql "${TEST_DB}"

# Run validation queries
TABLES=$(mysql -NBe "SELECT COUNT(*) FROM information_schema.tables WHERE table_schema='${TEST_DB}'")
ROWS=$(mysql -NBe "SELECT SUM(table_rows) FROM information_schema.tables WHERE table_schema='${TEST_DB}'")

echo "Backup: ${BACKUP_FILE}"
echo "Tables: ${TABLES}"
echo "Total rows: ${ROWS}"

# Compare against expected minimums
if [ "${TABLES}" -lt 10 ]; then
  echo "ALERT: Table count below threshold" >&2
  exit 1
fi

# Cleanup
mysql -e "DROP DATABASE ${TEST_DB}"
echo "Verification passed"

Automate verification weekly

Run backup verification on a schedule, not just when you suspect a problem. A cron job that restores to a test instance and checks row counts catches silent backup failures before they become disasters.

Scheduling and Rotation¶

Cron-Based Backup Scripts¶

A production backup script handles compression, logging, error reporting, and cleanup:

#!/bin/bash
# mysql_backup.sh - daily MySQL backup with rotation

set -euo pipefail

BACKUP_DIR="/backup/mysql"
DATE=$(date +%Y%m%d_%H%M%S)
LOG_FILE="${BACKUP_DIR}/backup_${DATE}.log"
RETENTION_DAYS=30

exec > >(tee -a "${LOG_FILE}") 2>&1
echo "Backup started: $(date)"

# Full logical backup
mysqldump --single-transaction --routines --triggers --events \
  --all-databases | gzip > "${BACKUP_DIR}/full_${DATE}.sql.gz"

# Generate checksum
sha256sum "${BACKUP_DIR}/full_${DATE}.sql.gz" > "${BACKUP_DIR}/full_${DATE}.sha256"

# Remove backups older than retention period
find "${BACKUP_DIR}" -name "full_*.sql.gz" -mtime +${RETENTION_DAYS} -delete
find "${BACKUP_DIR}" -name "full_*.sha256" -mtime +${RETENTION_DAYS} -delete
find "${BACKUP_DIR}" -name "backup_*.log" -mtime +${RETENTION_DAYS} -delete

echo "Backup completed: $(date)"
echo "Size: $(du -h "${BACKUP_DIR}/full_${DATE}.sql.gz" | cut -f1)"

Schedule with cron:

# Daily at 2:00 AM
0 2 * * * /usr/local/bin/mysql_backup.sh

# Weekly XtraBackup full + daily incrementals
0 3 * * 0 /usr/local/bin/xtrabackup_full.sh
0 3 * * 1-6 /usr/local/bin/xtrabackup_incremental.sh

Grandfather-Father-Son Rotation¶

The GFS rotation strategy balances retention depth with storage costs:

Level	Frequency	Retention	Purpose
Son (daily)	Every day	7-14 days	Recent recovery, accidental changes
Father (weekly)	Every Sunday	4-5 weeks	Weekly recovery points
Grandfather (monthly)	First of month	12 months	Monthly recovery, compliance
Yearly	January 1	3-7 years	Regulatory, legal holds

A 500 GB database with GFS rotation:

14 daily backups: 14 x 500 GB = 7 TB (or ~700 GB compressed)
4 weekly: 4 x 500 GB = 2 TB (or ~200 GB compressed)
12 monthly: 12 x 500 GB = 6 TB (or ~600 GB compressed)
Total: ~1.5 TB compressed storage for a full year of recovery points

Cloud and Snapshot Backups¶

Filesystem Snapshots¶

Filesystem snapshots (LVM, ZFS, EBS) create near-instant point-in-time copies of the underlying storage. They are the fastest backup method because they operate below the filesystem level.

Consistency Considerations¶

Database	Snapshot Method	Consistency Approach
MySQL/InnoDB	LVM/ZFS/EBS	`FLUSH TABLES WITH READ LOCK` then snapshot, or use XtraBackup
PostgreSQL	LVM/ZFS/EBS	`pg_backup_start()` / `pg_backup_stop()` + archive WAL
Any	Application-level	Quiesce writes at application layer before snapshot

Retention Policies¶

How long you keep backups depends on three factors: recovery requirements, regulatory obligations, and storage budget.

Recovery Requirements¶

Define your Recovery Point Objective (RPO) and Recovery Time Objective (RTO):

RPO: How much data loss is acceptable? An RPO of 1 hour means you need backups or log archiving at least every hour.
RTO: How quickly must you restore service? An RTO of 30 minutes rules out logical backups for large databases.

Regulatory Requirements¶

Some industries mandate specific retention periods:

Regulation	Retention Requirement
PCI DSS	1 year minimum for audit logs
HIPAA	6 years for health records
SOX	7 years for financial records
GDPR	Only as long as necessary (minimum retention)

GDPR is the outlier - it requires you to delete data you no longer need, which can conflict with long backup retention. Consult your compliance team.

Storage Cost Optimization¶

Strategies for managing backup storage costs:

Tiered storage: keep recent backups on fast storage (SSD/NVMe), move older backups to cheaper storage (HDD, object storage like S3 Glacier)
Incremental chains: store one full backup per week and incrementals daily (reduces storage by 60-80%)
Compression: gzip reduces logical backups by 80-90%; zstd offers better compression ratios at similar speed
Deduplication: tools like borgbackup or ZFS deduplication eliminate redundant blocks across backup versions

# Example: tiered storage with S3
# Recent backups on local disk
/backup/mysql/daily/     # last 7 days

# Weekly backups pushed to S3 Standard
aws s3 cp /backup/mysql/weekly/ s3://mycompany-db-backups/weekly/ --recursive

# Monthly backups moved to S3 Glacier
aws s3 cp /backup/mysql/monthly/ s3://mycompany-db-backups/monthly/ \
  --storage-class GLACIER --recursive