MySQL Replication & High Availability¶
A single MySQL server is a single point of failure. Every production database needs a plan for scaling reads, surviving hardware failures, and minimizing downtime during maintenance. MySQL provides several replication mechanisms - from basic binary log replication to fully automated clustering - and the right choice depends on your availability requirements, data consistency needs, and operational complexity budget.
This guide covers the replication landscape from the ground up: how binary log replication works, modern GTID-based topologies, semi-synchronous durability guarantees, Group Replication for automatic failover, and the tooling that ties it all together.
Binary Log Replication¶
Binary log replication is the foundation of all MySQL replication. Every other replication method builds on this mechanism. The concept is straightforward: the source server writes every data-modifying event to a binary log, and one or more replica servers read those events and replay them locally.
How It Works¶
The replication process involves three threads:
- Binlog dump thread (on the source) - reads the binary log and sends events to the replica
- I/O receiver thread (on the replica) - connects to the source, receives events, and writes them to the relay log
- SQL applier thread (on the replica) - reads events from the relay log and executes them against the local database
flowchart LR
subgraph Source["Source Server"]
direction TB
C["Client Writes"]
BL["Binary Log"]
BD["Binlog Dump Thread"]
end
subgraph Replica["Replica Server"]
direction TB
IO["I/O Receiver Thread"]
RL["Relay Log"]
SQL["SQL Applier Thread"]
D["Local Data"]
end
C --> BL
BL --> BD
BD -->|"Network"| IO
IO --> RL
RL --> SQL
SQL --> D
The relay log acts as a buffer. If the SQL applier thread falls behind, the I/O thread can continue receiving events without blocking the source. This separation means a slow replica does not slow down the source server.
Binary Log Formats¶
MySQL supports three binlog formats, controlled by the binlog_format variable:
| Format | How It Logs | Pros | Cons |
|---|---|---|---|
STATEMENT |
The SQL statement itself | Small log size, human-readable | Non-deterministic functions (NOW(), UUID()) can produce different results on replicas |
ROW |
The before/after image of each modified row | Deterministic, safe for all queries | Larger log size, especially for bulk updates |
MIXED |
Statement-based by default, switches to row-based when needed | Balances size and safety | Harder to predict which format is used |
Use ROW format
ROW is the default since MySQL 5.7.7 and the recommended format for all new deployments. The deterministic behavior eliminates an entire class of replication bugs. The increased log size is a worthwhile trade-off for reliability.
Configuring the Source¶
On the source server, enable the binary log and assign a unique server ID in my.cnf:
Setting sync_binlog = 1 forces the binary log to disk on every commit. This prevents data loss if the source crashes, at the cost of additional disk I/O.
Create a replication user with the required privilege:
CREATE USER 'repl'@'10.0.0.%' IDENTIFIED BY 'strong_password_here';
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'10.0.0.%';
Configuring the Replica¶
On the replica, set a unique server ID and configure the connection to the source:
Then point the replica at the source using the binary log file and position:
CHANGE REPLICATION SOURCE TO
SOURCE_HOST = '10.0.0.1',
SOURCE_USER = 'repl',
SOURCE_PASSWORD = 'strong_password_here',
SOURCE_LOG_FILE = 'mysql-bin.000001',
SOURCE_LOG_POS = 154;
START REPLICA;
The SOURCE_LOG_FILE and SOURCE_LOG_POS values come from SHOW MASTER STATUS on the source (or from a backup's metadata). Getting these wrong means the replica starts reading from the wrong position - it will either skip transactions or try to replay events that have already been applied.
File-and-position pitfalls
Binary log file names and positions are fragile. They change after log rotation, are specific to a single source, and make it difficult to re-point a replica to a new source after failover. GTID replication (covered next) eliminates these problems.
GTID Replication¶
Global Transaction Identifiers (GTIDs) replace file-and-position tracking with a globally unique identifier for every transaction. Each GTID has the format server_uuid:transaction_id - for example, 3E11FA47-71CA-11E1-9E33-C80AA9429562:42.
The server_uuid is automatically generated when MySQL initializes and stored in auto.cnf. The transaction_id is a monotonically increasing integer per server. Together, they guarantee that every transaction across the entire replication topology has a unique identity.
Why GTIDs Matter¶
With file-and-position replication, each replica tracks where it is reading from a specific source's binary log. If that source fails and you need to point the replica at a different server, you have to manually calculate the equivalent position in the new server's binary log. This is error-prone and often requires tools like mysqlbinlog to correlate positions.
With GTIDs, every server knows exactly which transactions it has executed (tracked in gtid_executed). When you point a replica at a new source, it simply says "I've executed these GTIDs, send me everything I'm missing." The source computes the difference and starts streaming.
Enabling GTID Replication¶
On both source and replica, add to my.cnf:
enforce_gtid_consistency blocks statements that cannot be safely represented as a single GTID (such as CREATE TABLE ... SELECT or transactions mixing InnoDB and non-transactional engines). This restriction is the trade-off for GTID's reliability.
log_replica_updates makes replicas write received events to their own binary log, which is required for chained replication (replica of a replica) and for any replica to serve as a failover candidate.
Configuring a GTID Replica¶
With GTIDs, the CHANGE REPLICATION SOURCE command is simpler because you do not need file names or positions:
CHANGE REPLICATION SOURCE TO
SOURCE_HOST = '10.0.0.1',
SOURCE_USER = 'repl',
SOURCE_PASSWORD = 'strong_password_here',
SOURCE_AUTO_POSITION = 1;
START REPLICA;
SOURCE_AUTO_POSITION = 1 tells the replica to use GTID-based auto-positioning. The replica sends its gtid_executed set to the source, and the source determines what to send.
Inspecting GTID State¶
Check which transactions have been executed:
Check which transactions have been received but not yet applied:
Semi-Synchronous Replication¶
Standard MySQL replication is asynchronous - the source commits a transaction and returns success to the client without waiting for any replica to acknowledge receipt. If the source crashes immediately after committing, committed transactions may be lost because no replica received them.
Semi-synchronous replication addresses this gap. With semi-sync enabled, the source waits after each commit until at least one replica acknowledges that it has received and written the event to its relay log. Only then does the source return success to the client.
The Durability Guarantee¶
Semi-sync does not guarantee that replicas have applied the transaction - only that they have received and persisted it in the relay log. This is enough to guarantee that the transaction survives a source crash, because at least one replica has the data and can be promoted.
The Performance Trade-off¶
Every transaction now includes a network round trip to at least one replica. On a local network, this adds sub-millisecond latency. Across data centers, it can add tens of milliseconds per commit. For write-heavy workloads, this overhead can be significant.
If no replica acknowledges within rpl_semi_sync_source_timeout milliseconds (default: 10000), the source falls back to asynchronous replication. This prevents a replica failure from hanging the source indefinitely.
Configuration¶
Install and enable the semi-sync plugins on the source:
INSTALL PLUGIN rpl_semi_sync_source SONAME 'semisync_source.so';
SET GLOBAL rpl_semi_sync_source_enabled = 1;
SET GLOBAL rpl_semi_sync_source_timeout = 5000; -- 5 seconds before fallback
On each replica:
INSTALL PLUGIN rpl_semi_sync_replica SONAME 'semisync_replica.so';
SET GLOBAL rpl_semi_sync_replica_enabled = 1;
Make these persistent by adding to my.cnf:
# Source
[mysqld]
rpl_semi_sync_source_enabled = 1
rpl_semi_sync_source_timeout = 5000
# Replica
[mysqld]
rpl_semi_sync_replica_enabled = 1
Monitor semi-sync status:
Key variables to watch:
| Variable | Meaning |
|---|---|
Rpl_semi_sync_source_status |
ON if semi-sync is active, OFF if fallen back to async |
Rpl_semi_sync_source_no_tx |
Transactions committed without semi-sync acknowledgment |
Rpl_semi_sync_source_yes_tx |
Transactions committed with semi-sync acknowledgment |
Rpl_semi_sync_source_avg_wait_time |
Average time waiting for replica acknowledgment |
MySQL 8.0.26+ naming
MySQL 8.0.26 renamed the semi-sync plugin from rpl_semi_sync_master/rpl_semi_sync_slave to rpl_semi_sync_source/rpl_semi_sync_replica. The old names still work as aliases but are deprecated.
Group Replication¶
MySQL Group Replication goes beyond traditional source-replica topologies. It provides a fault-tolerant replication mechanism where a group of servers coordinate through a consensus protocol (based on Paxos). If a server fails, the remaining members automatically reconfigure the group - no external failover tool needed.
How It Works¶
Every server in a Group Replication cluster is a full member of the group. When a member commits a transaction, the transaction is broadcast to all members through the group communication system. The group uses a consensus protocol to agree on the ordering of transactions. Only after a majority of members acknowledge the transaction is it considered committed.
This majority-based consensus means a group of N members can tolerate up to (N-1)/2 failures. A 3-member group tolerates 1 failure; a 5-member group tolerates 2.
Single-Primary vs Multi-Primary¶
Group Replication supports two modes:
Single-primary mode (default and recommended): One member is elected as the primary and accepts writes. All other members are read-only secondaries. If the primary fails, the group automatically elects a new primary from the remaining members.
Multi-primary mode: All members accept writes simultaneously. The group uses certification-based conflict detection to identify conflicting transactions. If two members modify the same row concurrently, one transaction is rolled back. This mode is useful for write-heavy workloads distributed across multiple locations, but conflict handling adds complexity.
| Feature | Single-Primary | Multi-Primary |
|---|---|---|
| Write endpoint | One server | Any server |
| Conflict handling | Not needed | Certification-based rollback |
| Complexity | Lower | Higher |
| Best for | Most workloads | Geographically distributed writes |
Basic Configuration¶
Each member needs Group Replication configured in my.cnf:
[mysqld]
server-id = 1
gtid-mode = ON
enforce-gtid-consistency = ON
log-bin = mysql-bin
log-replica-updates = ON
binlog-checksum = NONE
relay-log-info-repository = TABLE
transaction-write-set-extraction = XXHASH64
# Group Replication settings
plugin-load-add = group_replication.so
group_replication_group_name = "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
group_replication_start_on_boot = OFF
group_replication_local_address = "10.0.0.1:33061"
group_replication_group_seeds = "10.0.0.1:33061,10.0.0.2:33061,10.0.0.3:33061"
group_replication_single_primary_mode = ON
The group_replication_group_name is a UUID that identifies the group. All members must use the same value. Generate one with SELECT UUID().
Bootstrap the group on the first member:
SET GLOBAL group_replication_bootstrap_group = ON;
START GROUP_REPLICATION;
SET GLOBAL group_replication_bootstrap_group = OFF;
Then join additional members:
Never bootstrap twice
Only bootstrap the group once, on the first member. Bootstrapping a second time creates a split-brain situation - two separate groups with the same name, each believing it is authoritative.
InnoDB Cluster¶
InnoDB Cluster is MySQL's integrated high-availability solution. It combines three components into a managed stack:
- Group Replication - provides the underlying data replication and automatic failover
- MySQL Shell - provides the administration interface for creating and managing the cluster
- MySQL Router - provides transparent connection routing so applications do not need to know which server is the primary
flowchart TB
subgraph App["Application Layer"]
A1["App Server 1"]
A2["App Server 2"]
end
subgraph Router["MySQL Router"]
RW["Read-Write Port 6446"]
RO["Read-Only Port 6447"]
end
subgraph Cluster["InnoDB Cluster (Group Replication)"]
P["Primary (R/W)"]
S1["Secondary (R/O)"]
S2["Secondary (R/O)"]
P <--> S1
P <--> S2
S1 <--> S2
end
A1 --> RW
A1 --> RO
A2 --> RW
A2 --> RO
RW --> P
RO --> S1
RO --> S2
Setup Workflow¶
The typical setup uses MySQL Shell's AdminAPI:
Step 1: Prepare each server instance:
// In MySQL Shell (mysqlsh)
dba.configureInstance('root@10.0.0.1:3306');
dba.configureInstance('root@10.0.0.2:3306');
dba.configureInstance('root@10.0.0.3:3306');
This checks and applies the configuration requirements (GTID mode, binary logging, etc.).
Step 2: Create the cluster on the first instance:
Step 3: Add the remaining instances:
Step 4: Check cluster status:
Step 5: Bootstrap MySQL Router on each application server:
MySQL Router automatically discovers the cluster topology and routes read-write connections to the primary and read-only connections to the secondaries. When the primary fails and Group Replication elects a new one, Router detects the change and reroutes traffic - no application changes needed.
Checking Cluster Health¶
// In MySQL Shell
shell.connect('root@10.0.0.1:3306');
var cluster = dba.getCluster();
cluster.status();
The output shows the status of each member, its role (PRIMARY or SECONDARY), and any replication issues.
ProxySQL¶
ProxySQL is a high-performance proxy that sits between your application and MySQL. It provides connection pooling, read/write splitting, query routing, and query caching without modifying application code.
Connection Routing¶
ProxySQL maintains two hostgroups: one for writes (the primary) and one for reads (the replicas). Applications connect to ProxySQL on a single port, and ProxySQL routes queries to the appropriate hostgroup based on configurable rules.
Basic Setup¶
ProxySQL uses its own admin interface (default port 6032) for configuration. All configuration is stored in a SQLite database and applied at runtime.
Add your MySQL servers:
-- Connect to ProxySQL admin interface
-- mysql -u admin -padmin -h 127.0.0.1 -P 6032
-- Hostgroup 0 = writes (primary), Hostgroup 1 = reads (replicas)
INSERT INTO mysql_servers (hostgroup_id, hostname, port) VALUES
(0, '10.0.0.1', 3306),
(1, '10.0.0.2', 3306),
(1, '10.0.0.3', 3306);
LOAD MYSQL SERVERS TO RUNTIME;
SAVE MYSQL SERVERS TO DISK;
Read/Write Splitting¶
Configure query rules to route SELECT queries to the read hostgroup:
-- Route all SELECTs to hostgroup 1 (replicas)
INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup)
VALUES (1, 1, '^SELECT.*', 1);
-- Route SELECT ... FOR UPDATE to hostgroup 0 (primary) since it acquires locks
INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup)
VALUES (2, 1, '^SELECT.*FOR UPDATE', 0);
LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL QUERY RULES TO DISK;
Rules are evaluated in rule_id order. More specific rules should have lower rule_id values if you want them evaluated first, or use the flagIN/flagOUT chaining mechanism for complex routing logic.
User Configuration¶
Add MySQL users that ProxySQL will use:
INSERT INTO mysql_users (username, password, default_hostgroup)
VALUES ('app_user', 'app_password', 0);
LOAD MYSQL USERS TO RUNTIME;
SAVE MYSQL USERS TO DISK;
The default_hostgroup is used when no query rule matches (typically set to the write hostgroup so unclassified queries go to the primary).
ProxySQL vs MySQL Router
MySQL Router is the simpler choice when using InnoDB Cluster - it integrates tightly and requires minimal configuration. ProxySQL is more powerful for custom routing, query caching, connection multiplexing, and mixed environments where you need fine-grained control over how traffic reaches your databases.
Failover Patterns¶
When a source server fails, you need a plan to promote a replica. The complexity of this process depends on your replication topology and tooling.
Manual Failover¶
With basic binary log replication, failover is a manual process:
- Identify the most up-to-date replica (the one with the highest executed GTID set, or the most recent relay log position)
- Ensure the replica has applied all events in its relay log
- Promote the replica:
STOP REPLICA; RESET REPLICA ALL; - Disable read-only mode:
SET GLOBAL read_only = OFF; - Re-point remaining replicas to the new source
- Update application connection strings or DNS records
With GTID replication, step 5 is straightforward - each replica uses SOURCE_AUTO_POSITION = 1 and automatically catches up from the new source. Without GTIDs, you must calculate the correct binary log position on the new source for each remaining replica.
Automatic Failover with InnoDB Cluster¶
InnoDB Cluster handles failover automatically. When the primary fails:
- Group Replication detects the failure (typically within seconds)
- The remaining members elect a new primary based on a configurable weight
- MySQL Router detects the topology change and reroutes write traffic
- Applications continue without intervention
Automatic Failover with Orchestrator¶
Orchestrator is an external topology manager for MySQL replication. It continuously monitors the replication topology, detects failures, and can perform automated failover. Orchestrator is widely used in environments that run traditional (non-Group Replication) topologies and need automated recovery.
Orchestrator works by:
- Periodically polling each MySQL instance for its replication status
- Maintaining a topology map in its own backend database
- Running configurable recovery hooks (promotion scripts, DNS updates, VIP failover) when a failure is detected
Split-Brain Prevention¶
Split-brain occurs when two servers both believe they are the primary and accept writes simultaneously. This causes divergent data that is difficult or impossible to reconcile.
Prevention strategies:
| Strategy | How It Works |
|---|---|
| Quorum-based consensus | Group Replication requires a majority of members to agree before accepting writes. A partitioned minority cannot form quorum and stops accepting writes. |
| Fencing | Before promoting a new primary, ensure the old primary is truly down (STONITH - "Shoot The Other Node In The Head"). This can mean powering off the old server via IPMI/iLO or revoking its network access. |
| Lease-based leadership | The primary holds a time-limited lease (in a consensus store like etcd). If it cannot renew the lease, it stops accepting writes. Only a server holding a valid lease can serve as primary. |
Monitoring Replication Lag¶
Replication lag - the delay between a transaction committing on the source and being applied on the replica - is the most important replication health metric. A lagging replica serves stale data, which can cause application bugs that are difficult to diagnose.
Seconds_Behind_Master¶
The built-in metric from SHOW REPLICA STATUS:
Look for the Seconds_Behind_Source field (historically Seconds_Behind_Master). This value represents the difference between the replica's clock and the timestamp of the event it is currently applying.
The problem: this metric is unreliable. It resets to 0 when the SQL thread is idle (even if the I/O thread is behind), jumps erratically during large transactions, and does not account for clock drift between servers.
Heartbeat Tables¶
A more reliable approach: write a timestamp to a table on the source at regular intervals and measure the difference on the replica.
Create the heartbeat table:
CREATE TABLE IF NOT EXISTS replication_heartbeat (
id INT PRIMARY KEY,
ts TIMESTAMP(6) NOT NULL,
server_id INT NOT NULL
);
On the source, update the timestamp periodically (via a cron job or daemon):
On the replica, measure the lag:
SELECT TIMESTAMPDIFF(MICROSECOND, ts, NOW(6)) / 1000000 AS lag_seconds
FROM replication_heartbeat
WHERE id = 1;
pt-heartbeat¶
pt-heartbeat from Percona Toolkit automates the heartbeat approach. It is the industry standard for measuring replication lag:
Start the daemon on the source:
Monitor lag from the replica:
Or get a single reading:
Choosing a Topology¶
The right replication topology depends on your requirements:
| Requirement | Recommended Topology |
|---|---|
| Read scaling for a single application | Source with 1-3 async replicas |
| Zero committed-transaction loss | Semi-synchronous replication |
| Automatic failover without external tools | Group Replication or InnoDB Cluster |
| Application-transparent routing | InnoDB Cluster (Router) or ProxySQL |
| Fine-grained query routing and caching | ProxySQL |
| Multi-data-center writes | Group Replication in multi-primary mode (with careful conflict handling) |
| Maximum simplicity | InnoDB Cluster (manages Group Replication, Router, and administration in one stack) |
For most production deployments, start with InnoDB Cluster if you are running MySQL 8.0+. It provides automatic failover, transparent routing, and a management interface out of the box. Add ProxySQL if you need advanced query routing, connection multiplexing, or query caching on top.
Further Reading¶
- MySQL Replication Documentation - Official reference covering all replication modes, configuration, and troubleshooting
- MySQL Group Replication - Detailed documentation on consensus protocol, conflict detection, and operational procedures
- MySQL InnoDB Cluster - MySQL Shell AdminAPI reference for cluster setup and management
- ProxySQL Documentation - Configuration reference for query rules, hostgroups, and connection pooling
- Percona Toolkit: pt-heartbeat - Accurate replication lag measurement tool documentation
- Orchestrator - Topology management and automated failover for MySQL replication
Previous: MySQL Performance & Optimization | Next: PostgreSQL Fundamentals | Back to Index