This is the annex to Evaluating MySQL Parallel Replication Part 4: More Benchmarks in Production.
There is no introduction or conclusion to this post, only landing sections: reading this post without its context will probably be very hard. You should start with the main post and come back here for more details.
Implementation Details of MariaDB Optimistic Parallel Replication
Rollbacks and Retries
When transactions T1
to T20
are run concurrently
by optimistic parallel replication,
if T19
blocks T2
, T19
will be killed (rolled-back) for unblocking T2
(T2
must commit before T19
). In the current implementation, T19
will be retried once T18
completed. It looks like this could be optimized.
I thought that retrying T19
as early as T2
completes could improve
optimistic replication speed. Kristian Nielsen, the implementer of
parallel replication in MariaDB, was kind enough to implement a patch with
more aggressive retries.
However, with quicker retries, I got slower results than with delayed retries.
So it looks like once a conflict is detected (T19
blocks T2
), the
probability of another conflict is high, and the gain in retrying T19
earlier is outweighed by the cost of other rollbacks of T19
.
DML vs DDL and Non-Transactional Storage Engines
The assumption for optimistic parallel replication to work is that a
transaction that causes a conflict can be killed and retried. This is
the case for InnoDB DML (Data Manipulation Language:
INSERT
, UPDATE
, DELETE
,
...) but it is not the case with MyISAM.
As a transaction involving a MyISAM table (or another non-transactional storage engine) cannot be rolled-back, it is not safe to run those transactions optimistically. When such transaction enters the optimistic parallel replication pipeline, the replication applier will wait for all previous transactions to complete before starting the transaction that cannot be rolled-back. The following transactions could still be run optimistically if they are exclusively using a transactional storage engine (if they can be rolled-back). This means that DMLs that cannot be rolled-back act as a pre-barrier in the parallel replication pipeline.
In MariaDB, DDL (Data Definition Language: [CREATE | ALTER | TRUNCATE |
DROP | ...] TABLE
, ...) are also (still) impossible to rollback.
So they will also act
as a pre-barrier in the parallel replication pipeline. Moreover, DDL are
also preventing all next transactions to be optimistically applied because
a DML is not safe to run at the same time as a DDL on the same table. So,
not only DDLs act as a pre-barrier, but they are also acting as a post-barrier.
Different Optimistic Parallel Replication Modes
MariaDB 10.1 optimistic parallel replication can be run in two
slave_parallel_mode
:
optimistic
and aggressive
. In the optimistic mode, some heuristics
are used to avoid needless conflicts. In the aggressive mode, those heuristics
are disabled.
One of the heuristics of the optimistic mode is the following: if a transaction executed a row-lock wait on the master, it will not be run in parallel on the slave. The behavior is unclear when intermediate masters are used:
- An intermediate master with
slave_parallel_mode=none
(single threaded) will not have any row-lock wait. So it looks like for a slave of such intermediate master, the optimistic mode would behave the same way as the aggressive mode. - An intermediate master with
slave_parallel_mode=minimal
(slave group committing) will have a row-lock wait for each group commit. So it looks like for a slave of such intermediate master, the optimistic mode would behave the same as the conservative mode. - An intermediate master with
slave_parallel_mode=conservative
should generate very few row-lock wait (only for conflicts that will generate retries). So it looks like for a slave of such intermediate master, the optimistic mode will behave mostly the same as the aggressive mode. - The number of row-lock waits is hard to predict on an intermediate master in optimistic or aggressive mode. So the behavior of a slave of such intermediate master is hard to predict.
As we are doing tests on a slave of an intermediate master, the optimistic
mode is not very interesting to test. It would generate results similar
to the aggressive mode if the intermediate master was running in
single-threaded or conservative mode, or similar to the conservative mode if
the intermediate master was running in minimal mode. Without a true
master running MariaDB 10.1, the only tests that we think make sense are with
slave_parallel_mode=aggressive
.
This is a good opportunity to remind that intermediate masters are bad for parallel replication. As shown in a Part 1, intermediate master are doing a poor job at transmitting parallelism information from their master to their slaves. The solution presented in the previous post still applies: use Binlog Servers.
Environments
As in the previous posts
(Part 1,
Part 2
and Part 3),
we are using the same four environments.
Each environment is composed of five servers.
For slave_parallel_mode=none
and slave_parallel_mode=conservative
,
only four of the five servers are needed and are organized as below:
+---+ +---+ +---+ +---+ | A | --> | B | --> | C | --> | D | +---+ +---+ +---+ +---+
The A
to C
servers are strictly the same as
before.
The D
server has the same hardware specification as before
but it is now running MariaDB 10.1.8 [1].
This means that the conservative results will use the
same parallelism information (group commit) as for the tests from
Part 3
(we are re-using the same binary logs as the previous tests).
For optimistic parallel replication to work, a MariaDB 10.1 slave must
be connected to a MariaDB 10.1 master [2], hence the introduction
of a fifth (E
) server. For slave_parallel_mode=aggressive
, D
is
replicating from E
as shown below:
+---+ +---+ +---+ +---+ +---+ | A | --> | B | --> | C | --> | E | --> | D | +---+ +---+ +---+ +---+ +---+
The hardware specifications of E
are not important because it is only
serving binary logs. It was built as a clone of D
that was upgraded to
MariaDB 10.1. Replication was then started from C
with
slave_parallel_mode=none
. This way, we produced 10.1 binary
logs so slave_parallel_mode=aggressive
will work on D
.
The full test methodology is the same as for the previous tests and can be found in Part 3. The server and database configurations are mostly the same as in the previous tests with the following modifications:
Property | E1 | E2 | E3 | E4 |
---|---|---|---|---|
InnoDB Buffer Pool Size | ||||
InnoDB Log Size |
The motivations for the above changes are the following:
- The InnoDB Buffer Pool Size was reduced for
E3
andE4
because we were missing RAM to increaseslave_parallel_threads
to the number we wanted to test (more threads need more available RAM). - The InnoDB Log Size was increased because checkpointing was a bottleneck during our tests [3].
Results
In the main post, speedup graphs are presented for each of the four environments. Here, the underlying data for those graphs is presented.
The SB, HD and ND notations are explained in the main post.
The first line of the table below shows the time taken for the single-threaded
execution with slave_parallel_mode
(SPM) set to none
.
Then, for slave_parallel_threads
(SPT) values of 5, 10, 20 and 40,
we have results with both non-optimistic (slave_parallel_mode=conservative
)
and optimistic (slave_parallel_mode=aggressive
) executions.
Then, for slave_parallel_threads
values of
80, 160, 320, 640, 1280, 2560 and 5120,
we have results only for optimistic executions. Note that we cannot have
meaningful results for non-optimistic runs with slave_parallel_threads
greater than 40 because the maximum group size on C
was 35 (see
Part 3
for more details).
The times presented below are in the format hours:minutes.seconds and they represent the delay needed to process 24-hours of transactions. The number in bold is the speedup achieved from the single-threaded run.
E1 | E2 | E3 | E4 | ||||
---|---|---|---|---|---|---|---|
SPT | SPM | SB-HD | SB-ND | SB-HD | SB-ND | SB-HD | SB-HD |
1.48 | 1.04 | 1.85 | 1.09 | 1.18 | 1.41 | ||
1.54 | 1.09 | 1.90 | 1.13 | 1.18 | 1.22 | ||
1.69 | 1.11 | 2.18 | 1.15 | 1.24 | 1.48 | ||
1.80 | 1.24 | 2.27 | 1.25 | 1.27 | 1.34 | ||
1.85 | 1.18 | 2.36 | 1.16 | 1.28 | 1.53 | ||
2.13 | 1.41 | 2.67 | 1.45 | 1.35 | 1.50 | ||
1.89 | 1.20 | 2.42 | 1.18 | 1.30 | 1.55 | ||
2.38 | 1.63 | 3.05 | 1.68 | 1.44 | 1.86 | ||
2.60 | 1.84 | 3.30 | 1.92 | 1.52 | 2.39 | ||
2.81 | 1.96 | 3.36 | 2.04 | 1.63 | 2.91 | ||
2.83 | 2.05 | 3.17 | 1.93 | 1.75 | 3.36 | ||
2.80 | 2.05 | 2.73 | 1.53 | 1.97 | 3.78 | ||
2.80 | 1.99 | 2.02 | 1.01 | 2.15 | 3.69 | ||
2.74 | 1.93 | 1.28 | 0.59 | 2.29 | 3.52 | ||
2.75 | 1.90 | 0.64 | 0.29 | 2.27 | 3.28 |
Graphs during Tests
If you spot something we might have missed in the graphs below, please post a comment. Those graphs include the number of commits per second, CPU stats, Read IOPS and percentage of Retried Transaction for all tests.
[1] At the time of the publication of this post, the latest release of MariaDB 10.1 is 10.1.17. Our tests were done with MariaDB 10.1.8 because they were run a long time ago (I am a little embarrassed to be that late in my blog post editing).
[2] In the implementation of optimistic parallel replication in MariaDB 10.1, the master is responsible for flagging DDL and non-transactional DML and to pass this information to slaves via the binary logs. This is why a MariaDB 10.1 master is needed to enable optimistic parallel replication on a slave. This also means that for optimistic parallel replication to work, master and slaves must have compatible storage engines for DML: if a DML is transactional on the master, it must be transactional on the slave. So a master using InnoDB and a slave using MyISAM will not work.
[3] Because the InnoDB Log Size was too small in our previous tests, those tests were run in non-optimal conditions. The results presented in this post should be considered more accurate.