This article discusses about the approach taken by InnoDB Storage Engine of MySQL to provide the
repeatable read isolation level. First, an example is
presented to demonstrate the two different designs that are
possible. Then the design used in InnoDB is presented
followed by a short discussion about the advantages and
disadvantages of this design choice. As part of this
discussion, we also present a performance optimization done in
An Example Scenario
I used MySQL 5.5 for this purpose. Let us create the
following tables t1 and t2 in the test database that is available
by default. Even though the default storage engine in MySQL 5.5
is InnoDB, I explicitly specify it for clarity.
mysql> use test;
mysql> create table t1 (f1 int) engine=innodb;
mysql> create table t2 (f1 int) engine=innodb;
mysql> insert into t1 values (1);
mysql> insert into t2 values (1);
Consider the following scenario (transactions are started with REPEATABLE READ isolation level):
select f1 from t1;
update t2 set f1 = f1+1;
select f1 from t2;
Please go through the above scenario and find out what would be
the result set in Session 1, line 6, query “select f1 from
t2”? Proceed once you know what you would expect.
By default, the isolation level in InnoDB is repeatable read. So in line 1 and 3 the transactions would be started with repeatable read isolation level. The output of the query in line 6 above will depend on the design approach taken to implement this isolation level. The two different design approaches possible are the traditional locking approach and the snapshot isolation technique.
In traditional locking approach, a transaction running in REPEATABLE READ isolation level would acquire a row lock on all the rows that has been read. No gap locks or range locks will be taken. These transactional row locks will be used to provide the semantics of the REPEATABLE READ isolation level. There are two drawbacks in this approach. One, the number of row locks taken can become very high and hence will be a performance problem. Two, this approach will not prevent phantom reads.
In InnoDB, a REPEATABLE READ isolation level is provided by making use of the snapshot isolation technique. This technique will be explained in the following sections.
If the traditional locking approach is taken then the output would be 2 (the new value). But in InnoDB the output of query “select f1 from t2” in line 6 would be 1 (the old value).
Creating a Consistent Read View
InnoDB creates a consistent read view or a consistent snapshot
either when the statement
mysql> START TRANSACTION WITH CONSISTENT SNAPSHOT;
is executed or when the first select query is executed in the transaction. In the example given above, the consistent read view is created in Session 1 when the first select query in line 2 is executed. This consistent read view is at the database level, so further select queries in Session 1 will not see the modifications done by other transactions in any of the tables in the database. This is the reason why the “select f1 from t2” in Session 1 sees the old tuple instead of the new tuple as updated by Session 2. But Session 1 can see the modifications done by itself.
When a transaction creates a read view, it creates a read view (read_view_t) object and copies the list of all the active transactions into it. It uses this information later to provide a consistent view of the database as it existed when the read view was created. How this is done is detailed in the sections below. But before we proceed further, there is a small detail that we need to be aware of to understand how the read view works.
Hidden Fields of a InnoDB Table
For all InnoDB tables, 3 fields are internally added to the table – DB_ROW_ID, DB_TRX_ID and DB_ROLL_PTR (refer to InnoDB Multi-Versioning). They are explained below:
The row identifier
The identifier of the transaction identifier that created the tuple
The rollback data pointer, pointing to previous versions of the tuple in the undo logs.
Update: For tables, with explicit PRIMARY KEY or UNIQUE
NOT NULL key, the DB_ROW_ID will not be stored in the row, even
though it will be listed as one of the columns in the data
dictionary object for the table (of type
For our current discussion the focus is on DB_TRX_ID and the DB_ROLL_PTR. Each tuple or row contains the identifier of the transaction that created the tuple, and a list of previous versions of the tuple.
In InnoDB, the transaction identifier is an integer value and it
is a monotonically increasing value. So using
the transaction identifier one can determine which transaction
started earlier and which started later.
How Consistent Read View Works
With the help of read view object and the hidden fields in the
tuple, a transaction can construct the database as it existed at
the time the consistent read view was created. For this
purpose, the transaction using the read view has to determine
whether it can "see a particular transaction". This
is done using the following 2 rules:
Rule 1: When the read view object is created it notes down the smallest transaction identifier that is not yet used as a transaction identifier (trx_sys_t::max_trx_id). The read view calls it the low limit. So the transaction using the read view must not see any transaction with identifier greater than or equal to this low limit.
Rule 2: The transaction using the read view must
not see a transaction that was active when the read view was
Whenever a tuple is accessed, the transaction using the read view
will check if it can see the transaction that created the tuple
(using the two rules mentioned above). If not, then check
if the tuple has a previous version. If there are no
previous versions then ignore the tuple and proceed to the next
tuple. If there are previous versions, then repeat the
procedure to find and build the correct version of the tuple to
access from the undo log by following the DB_ROLL_PTR.
This avoids the phantom read problem in the REPEATABLE READ isolation in InnoDB.
Purpose of Read View
Why create the read view in the first place? To provide the
repeatable read isolation level, creating a read view and later
accessing the various versions of the row without locking has a
performance benefit. The alternative would be to take a
read lock on all the rows that was read in the transaction with
repeatable read isolation level. Accessing a large table
with millions of records would generate that many shared row
locks. For performance gain a read view is created.
Performance Problem in MySQL 5.5
A transaction using the read view must decide whether to see the
rows created by a particular transaction. This is done with
the help of the two rules given above in the section “How
Consistent Read View Works.” The first rule is simple and
doesn't involve much copying. But the 2nd rule is
expensive. Because of the 2nd rule, when the read view
object is created, the complete list of active transaction
identifiers are copied. The number of bytes copied depends
on the number of active transactions in the database system when
the read view is created.
The performance problem is not only because of the number of bytes copied during the read view creation, but also because of the actual traversal of the list of all the active transactions. This traversal is done under the protection of a mutex. So when there are many transactions trying to create a read view concurrently, the contention for this mutex becomes high.
The Read Only Optimization in MySQL 5.6
A recent discussion in InnoDB developers mailing list suggested
that the read view creation can be a very expensive operation if
the number of active concurrent transactions are very high (say
in thousands). One of the optimizations that was discussed
involved identifying the read-only transactions and ignoring them
during read view creation.
Since read-only transactions do not make any changes that change the state of the tuple they are not required to be part of the read view. If the read-only transactions are ignored, then the number of transaction identifiers that needs to be copied during the read view creation can be reduced significantly. This will also reduce the mutex contention discussed earlier. This idea is based on the reasonable assumption that at any point of time, there will be a mix of read-only and read-write active transactions in a database system and that the read-only transactions will dominate the work load.
To facilitate this the Transaction Subsystem of InnoDB in MySQL 5.6 contains two lists of transactions, one for the read-write transactions (trx_sys_t::rw_trx_list) and the other for read-only transactions (trx_sys_t::ro_trx_list). For creating the read view only the list of read-write transactions are used. The length of transaction list to be traversed to create a read view is now smaller and the number of bytes copied is also less. This makes the read view creation faster and more scalable.
Refer to Optimizations for Read-Only Transactions in MySQL 5.6 Reference Manual for further details.
In this article, we saw how InnoDB provides the repeatable read
isolation level using a consistent read view or consistent
snapshot. Because of this approach, InnoDB is able to avoid
the creation of many shared row locks and hence is able to
achieve better performance. The phantom read problem is
also avoided. We also saw how the consistent read view
We discussed a particular performance problem in MySQL 5.5 and presented an optimization done in MySQL 5.6, which makes the read view creation faster and more scalable. In certain highly concurrent situations, the current design of read view does have some performance challenges and MySQL team at Oracle is actively working to find solutions.
For further reading refer to the Section “Consistent Nonlocking Reads” of the MySQL Reference Manual. Send your feedback to email@example.com.