What is database replication?
MySQL has a feature known as replication where data from one database server (referred to as the source or primary) is copied to other database servers (replicas). Conceptually, it’s pretty straight forward. All of your data mutations (INSERT, UPDATE, DELETE, ALTER, etc.) are done against the primary database. Those commands are then copied and applied to the replicas.
Why is database replication useful?
There are a number of reasons replication like this may be desirable.
- Scaling out : When the majority of operations in your database are reads, then you can spread the reads across databases while keeping the primary available for handling the writes. For example, imagine you have an application that performs 100k operations per second (OPS): 10k writes and 90k reads. We can configure a replication setup of 2 replicas so that each database now takes on 30k of those reads. However, because we still need to perform writes on the primary database, and each replica needs to copy them, we are effectively doing 40k OPS on each server (10k writes + 30k reads). This results in a net reduction of stress on the primary database and allows us to scale horizontally on the basis that our reads will grow more than our writes.
- Analytics: We can have a replica database dedicated to handling large, and possibly very slow, queries without disrupting the availability of the primary for writes.
- Security: A replica can be paused, have a backup performed on it, then resume replication without fear of any data corruption occurring on the primary.
How does database replication work in MySQL?
The replication feature in MySQL is achieved with a handful of threads and a few resources.
Resources:
- Binary log - The binary log contains a record of all the events which caused data in the primary database to change (INSERT, DELETE, UPDATE, DROP TABLE, etc.). These are the events that get sent to replicas to be run. The format of the events in the binary log can be configured to be the SQL statements themselves, row based changes, or a mixture of the two. The default is row based. The binary log is made up of possibly many files.
- Relay log - This is the replica server’s view of the binary log. In fact the binary log and the relay log are just different instances of the same class,
MYSQL_BIN_LOG
. The only difference is that on the replica side ais_relay_log
property will be set to true on it. The relay log contains all the mutations from the binary log that have been sent from the primary to the replica. Its contents are in the same format as the binary log. Like the binary log, the relay log can also be made up of many files.
Threads:
- Binary log dump thread - This thread is created on the primary database server. It’s responsible for reading the binary log and sending the updates it reads to the replica when it requests them. One of these threads is created for each replica server connected, meaning 2 replicas == 2 binary log dump threads.
- Replication I/O receiver thread - This thread is created on the replica. It is responsible for connecting to the primary database and requesting updates in the binary log that it hasn’t seen. It’ll then write those updates to the relay log. One of these threads is created on each replica server
- Replication SQL applier thread - This thread is created on the replica. It reads the update events in the relay log and then applies the events in it to it’s database. The number of SQL applier threads by default is 1 per replica, but this can be configured to be more based on the
replica_parallel_workers
value.
The fetching and storing of binary log events into the relay log is separated from applying the events. This helps in preventing slow to apply queries from holding up the syncing of the binary log to the replica. It also allows for a local copy of events the applier thread can resume from if the replica was shut down then started up later.
On the primary database, when a write transaction is completed or the commit is done, the binary log is updated and flushed to disk. There is also a mutex around the binary log that allows only the session holding it to read or write to it. You can see the contention around this mutex by checking the performance schema tables for wait events on the following resources:
wait/synch/cond/sql/MYSQL_BIN_LOG::COND_done
wait/synch/mutex/sql/MYSQL_BIN_LOG::LOCK_commit
wait/synch/mutex/sql/MYSQL_BIN_LOG::LOCK_log
A wait is simply an event that takes time. These events are formed as a hierarchy divided by the /
separator. It should be read from left to right, general to specific (the naming convention is described in more detail on the MySQL docs here). Remember how the binary log and relay log are different instances of the same class, MYSQL_BIN_LOG
? That’s the resource that is being shown on the far right of these events! Any of these events mean that there is time being spent (either on the replica or the primary depending what server you query the performance metrics from) where a thread is waiting to access the binary/relay log.