This article discusses about the approach taken by InnoDB Storage Engine of MySQL to provide the
repeatable read isolation level. First, an example is
presented to demonstrate the two different designs that are
possible. Then the design used in InnoDB is presented
followed by a short discussion about the advantages and
disadvantages of this design choice. As part of this
discussion, we also present a performance optimization done in
MySQL 5.6.
An Example Scenario
I used MySQL 5.5 for this purpose. Let us create the
following tables t1 and t2 in the test database that is available
by default. Even though the default storage engine in MySQL 5.5
is InnoDB, I explicitly specify it for clarity.
mysql> use test;
mysql> create table t1 (f1 int) engine=innodb;
mysql> create table t2 (f1 int) engine=innodb;
mysql> insert into t1 values (1);
mysql> insert into t2 values (1);
Consider the following scenario (transactions are started with
REPEATABLE READ isolation level):
-
S. No.
Session 1
Session 2
1
start transaction;
2
select f1 from t1;
3
start transaction;
4
update t2 set f1 = f1+1;
5
commit;
6
select f1 from t2;
Please go through the above scenario and find out what would be
the result set in Session 1, line 6, query “select f1 from
t2”? Proceed once you know what you would expect.
By default, the isolation level in InnoDB is repeatable
read. So in line 1 and 3 the transactions would be started
with repeatable read isolation level. The output of the
query in line 6 above will depend on the design approach taken to
implement this isolation level. The two different design
approaches possible are the traditional locking approach and the
snapshot isolation technique.
In traditional locking approach, a transaction running in
REPEATABLE READ isolation level would acquire a row lock on all
the rows that has been read. No gap locks or range locks
will be taken. These transactional row locks will be used
to provide the semantics of the REPEATABLE READ isolation
level. There are two drawbacks in this
approach. One, the number of row locks taken can become
very high and hence will be a performance problem. Two,
this approach will not prevent phantom reads.
In InnoDB, a REPEATABLE READ isolation level is provided by
making use of the snapshot isolation technique. This
technique will be explained in the following sections.
If the traditional locking approach is taken then the output would be 2 (the new value). But in InnoDB the output of query “select f1 from t2” in line 6 would be 1 (the old value).
Creating a Consistent Read View
InnoDB creates a consistent read view or a consistent snapshot
either when the statement
mysql> START TRANSACTION WITH CONSISTENT SNAPSHOT;
is executed or when the first select query is executed in the
transaction. In the example given above, the consistent
read view is created in Session 1 when the first select query in
line 2 is executed. This consistent read view is at the
database level, so further select queries in Session 1 will not
see the modifications done by other transactions in any of the
tables in the database. This is the reason why the “select
f1 from t2” in Session 1 sees the old tuple instead of the new
tuple as updated by Session 2. But Session 1 can see the
modifications done by itself.
When a transaction creates a read view, it creates a read view
(read_view_t) object and copies the list of all the active
transactions into it. It uses this information later to
provide a consistent view of the database as it existed when the
read view was created. How this is done is detailed
in the sections below. But before we proceed further, there
is a small detail that we need to be aware of to understand how
the read view works.
Hidden Fields of a InnoDB Table
For all InnoDB tables, 3 fields are internally added to the table – DB_ROW_ID, DB_TRX_ID and DB_ROLL_PTR (refer to InnoDB Multi-Versioning). They are explained below:
S. No. |
System Column |
Description |
Length (bytes) |
---|---|---|---|
1 |
DB_ROW_ID |
The row identifier |
6 |
2 |
DB_TRX_ID |
The identifier of the transaction identifier that created the tuple |
6 |
3 |
DB_ROLL_PTR |
The rollback data pointer, pointing to previous versions of the tuple in the undo logs. |
7 |
Update: For tables, with explicit PRIMARY KEY or UNIQUE
NOT NULL key, the DB_ROW_ID will not be stored in the row, even
though it will be listed as one of the columns in the data
dictionary object for the table (of type
dict_table_t).
For our current discussion the focus is on DB_TRX_ID and the DB_ROLL_PTR. Each tuple or row contains the identifier of the transaction that created the tuple, and a list of previous versions of the tuple.
In InnoDB, the transaction identifier is an integer value and it
is a monotonically increasing value. So using
the transaction identifier one can determine which transaction
started earlier and which started later.
How Consistent Read View Works
With the help of read view object and the hidden fields in the
tuple, a transaction can construct the database as it existed at
the time the consistent read view was created. For this
purpose, the transaction using the read view has to determine
whether it can "see a particular transaction". This
is done using the following 2 rules:
Rule 1: When the read view object is created it
notes down the smallest transaction identifier that is not yet
used as a transaction identifier
(trx_sys_t::max_trx_id). The read view calls it the
low limit. So the transaction using the read view must not see
any transaction with identifier greater than or equal to this low
limit.
Rule 2: The transaction using the read view must
not see a transaction that was active when the read view was
created.
Whenever a tuple is accessed, the transaction using the read view
will check if it can see the transaction that created the tuple
(using the two rules mentioned above). If not, then check
if the tuple has a previous version. If there are no
previous versions then ignore the tuple and proceed to the next
tuple. If there are previous versions, then repeat the
procedure to find and build the correct version of the tuple to
access from the undo log by following the DB_ROLL_PTR.
This avoids the phantom read problem in the REPEATABLE READ
isolation in InnoDB.
Purpose of Read View
Why create the read view in the first place? To provide the
repeatable read isolation level, creating a read view and later
accessing the various versions of the row without locking has a
performance benefit. The alternative would be to take a
read lock on all the rows that was read in the transaction with
repeatable read isolation level. Accessing a large table
with millions of records would generate that many shared row
locks. For performance gain a read view is created.
Performance Problem in MySQL 5.5
A transaction using the read view must decide whether to see the
rows created by a particular transaction. This is done with
the help of the two rules given above in the section “How
Consistent Read View Works.” The first rule is simple and
doesn't involve much copying. But the 2nd rule is
expensive. Because of the 2nd rule, when the read view
object is created, the complete list of active transaction
identifiers are copied. The number of bytes copied depends
on the number of active transactions in the database system when
the read view is created.
The performance problem is not only because of the number of
bytes copied during the read view creation, but also because of
the actual traversal of the list of all the active
transactions. This traversal is done under the protection
of a mutex. So when there are many transactions trying to
create a read view concurrently, the contention for this mutex
becomes high.
The Read Only Optimization in MySQL 5.6
A recent discussion in InnoDB developers mailing list suggested
that the read view creation can be a very expensive operation if
the number of active concurrent transactions are very high (say
in thousands). One of the optimizations that was discussed
involved identifying the read-only transactions and ignoring them
during read view creation.
Since read-only transactions do not make any changes that change
the state of the tuple they are not required to be part of the
read view. If the read-only transactions are ignored, then the
number of transaction identifiers that needs to be copied during
the read view creation can be reduced significantly.
This will also reduce the mutex contention discussed
earlier. This idea is based on the reasonable assumption
that at any point of time, there will be a mix of read-only and
read-write active transactions in a database system and that the
read-only transactions will dominate the work load.
To facilitate this the Transaction Subsystem of InnoDB in MySQL
5.6 contains two lists of transactions, one for the read-write
transactions (trx_sys_t::rw_trx_list) and the other for read-only
transactions (trx_sys_t::ro_trx_list). For creating the
read view only the list of read-write transactions are
used. The length of transaction list to be traversed to
create a read view is now smaller and the number of bytes copied
is also less. This makes the read view creation faster and
more scalable.
Refer to Optimizations for Read-Only Transactions in
MySQL 5.6 Reference Manual for further details.
Conclusion
In this article, we saw how InnoDB provides the repeatable read
isolation level using a consistent read view or consistent
snapshot. Because of this approach, InnoDB is able to avoid
the creation of many shared row locks and hence is able to
achieve better performance. The phantom read problem is
also avoided. We also saw how the consistent read view
works.
We discussed a particular performance problem in MySQL 5.5 and
presented an optimization done in MySQL 5.6, which makes the read
view creation faster and more scalable. In certain highly
concurrent situations, the current design of read view does have
some performance challenges and MySQL team at Oracle is actively
working to find solutions.
For further reading refer to the Section “Consistent Nonlocking Reads” of the MySQL Reference
Manual. Send your feedback to
annamalai.gurusami@oracle.com.