Planet MySQL :: Planet MySQL - Archives - Repeatable Read Isolation Level in InnoDB

This article discusses about the approach taken by InnoDB Storage Engine of MySQL to provide the repeatable read isolation level. First, an example is presented to demonstrate the two different designs that are possible. Then the design used in InnoDB is presented followed by a short discussion about the advantages and disadvantages of this design choice. As part of this discussion, we also present a performance optimization done in MySQL 5.6.
An Example Scenario

I used MySQL 5.5 for this purpose. Let us create the following tables t1 and t2 in the test database that is available by default. Even though the default storage engine in MySQL 5.5 is InnoDB, I explicitly specify it for clarity.

mysql> use test;
mysql> create table t1 (f1 int) engine=innodb;
mysql> create table t2 (f1 int) engine=innodb;
mysql> insert into t1 values (1);
mysql> insert into t2 values (1);

Consider the following scenario (transactions are started with REPEATABLE READ isolation level):

S. No.	Session 1	Session 2
1	start transaction;
2	select f1 from t1;
3		start transaction;
4		update t2 set f1 = f1+1;
5		commit;
6	select f1 from t2;

Please go through the above scenario and find out what would be the result set in Session 1, line 6, query “select f1 from t2”? Proceed once you know what you would expect.

By default, the isolation level in InnoDB is repeatable read. So in line 1 and 3 the transactions would be started with repeatable read isolation level. The output of the query in line 6 above will depend on the design approach taken to implement this isolation level. The two different design approaches possible are the traditional locking approach and the snapshot isolation technique.

In traditional locking approach, a transaction running in REPEATABLE READ isolation level would acquire a row lock on all the rows that has been read. No gap locks or range locks will be taken. These transactional row locks will be used to provide the semantics of the REPEATABLE READ isolation level. There are two drawbacks in this approach. One, the number of row locks taken can become very high and hence will be a performance problem. Two, this approach will not prevent phantom reads.

In InnoDB, a REPEATABLE READ isolation level is provided by making use of the snapshot isolation technique. This technique will be explained in the following sections.

If the traditional locking approach is taken then the output would be 2 (the new value). But in InnoDB the output of query “select f1 from t2” in line 6 would be 1 (the old value).

Creating a Consistent Read View

InnoDB creates a consistent read view or a consistent snapshot either when the statement

mysql> START TRANSACTION WITH CONSISTENT SNAPSHOT;

is executed or when the first select query is executed in the transaction. In the example given above, the consistent read view is created in Session 1 when the first select query in line 2 is executed. This consistent read view is at the database level, so further select queries in Session 1 will not see the modifications done by other transactions in any of the tables in the database. This is the reason why the “select f1 from t2” in Session 1 sees the old tuple instead of the new tuple as updated by Session 2. But Session 1 can see the modifications done by itself.

When a transaction creates a read view, it creates a read view (read_view_t) object and copies the list of all the active transactions into it. It uses this information later to provide a consistent view of the database as it existed when the read view was created. How this is done is detailed in the sections below. But before we proceed further, there is a small detail that we need to be aware of to understand how the read view works.

Hidden Fields of a InnoDB Table

For all InnoDB tables, 3 fields are internally added to the table – DB_ROW_ID, DB_TRX_ID and DB_ROLL_PTR (refer to InnoDB Multi-Versioning). They are explained below:

S. No.	System Column	Description	Length (bytes)
1	DB_ROW_ID	The row identifier	6
2	DB_TRX_ID	The identifier of the transaction identifier that created the tuple	6
3	DB_ROLL_PTR	The rollback data pointer, pointing to previous versions of the tuple in the undo logs.	7

Update: For tables, with explicit PRIMARY KEY or UNIQUE NOT NULL key, the DB_ROW_ID will not be stored in the row, even though it will be listed as one of the columns in the data dictionary object for the table (of type dict_table_t).

For our current discussion the focus is on DB_TRX_ID and the DB_ROLL_PTR. Each tuple or row contains the identifier of the transaction that created the tuple, and a list of previous versions of the tuple.

In InnoDB, the transaction identifier is an integer value and it is a monotonically increasing value. So using the transaction identifier one can determine which transaction started earlier and which started later.

How Consistent Read View Works

With the help of read view object and the hidden fields in the tuple, a transaction can construct the database as it existed at the time the consistent read view was created. For this purpose, the transaction using the read view has to determine whether it can "see a particular transaction". This is done using the following 2 rules:

Rule 1: When the read view object is created it notes down the smallest transaction identifier that is not yet used as a transaction identifier (trx_sys_t::max_trx_id). The read view calls it the low limit. So the transaction using the read view must not see any transaction with identifier greater than or equal to this low limit.

Rule 2: The transaction using the read view must not see a transaction that was active when the read view was created.

Whenever a tuple is accessed, the transaction using the read view will check if it can see the transaction that created the tuple (using the two rules mentioned above). If not, then check if the tuple has a previous version. If there are no previous versions then ignore the tuple and proceed to the next tuple. If there are previous versions, then repeat the procedure to find and build the correct version of the tuple to access from the undo log by following the DB_ROLL_PTR.

This avoids the phantom read problem in the REPEATABLE READ isolation in InnoDB.

Purpose of Read View

Why create the read view in the first place? To provide the repeatable read isolation level, creating a read view and later accessing the various versions of the row without locking has a performance benefit. The alternative would be to take a read lock on all the rows that was read in the transaction with repeatable read isolation level. Accessing a large table with millions of records would generate that many shared row locks. For performance gain a read view is created.

Performance Problem in MySQL 5.5

A transaction using the read view must decide whether to see the rows created by a particular transaction. This is done with the help of the two rules given above in the section “How Consistent Read View Works.” The first rule is simple and doesn't involve much copying. But the 2nd rule is expensive. Because of the 2nd rule, when the read view object is created, the complete list of active transaction identifiers are copied. The number of bytes copied depends on the number of active transactions in the database system when the read view is created.

The performance problem is not only because of the number of bytes copied during the read view creation, but also because of the actual traversal of the list of all the active transactions. This traversal is done under the protection of a mutex. So when there are many transactions trying to create a read view concurrently, the contention for this mutex becomes high.

The Read Only Optimization in MySQL 5.6

A recent discussion in InnoDB developers mailing list suggested that the read view creation can be a very expensive operation if the number of active concurrent transactions are very high (say in thousands). One of the optimizations that was discussed involved identifying the read-only transactions and ignoring them during read view creation.

Since read-only transactions do not make any changes that change the state of the tuple they are not required to be part of the read view. If the read-only transactions are ignored, then the number of transaction identifiers that needs to be copied during the read view creation can be reduced significantly. This will also reduce the mutex contention discussed earlier. This idea is based on the reasonable assumption that at any point of time, there will be a mix of read-only and read-write active transactions in a database system and that the read-only transactions will dominate the work load.

To facilitate this the Transaction Subsystem of InnoDB in MySQL 5.6 contains two lists of transactions, one for the read-write transactions (trx_sys_t::rw_trx_list) and the other for read-only transactions (trx_sys_t::ro_trx_list). For creating the read view only the list of read-write transactions are used. The length of transaction list to be traversed to create a read view is now smaller and the number of bytes copied is also less. This makes the read view creation faster and more scalable.

Refer to Optimizations for Read-Only Transactions in MySQL 5.6 Reference Manual for further details.

Conclusion

In this article, we saw how InnoDB provides the repeatable read isolation level using a consistent read view or consistent snapshot. Because of this approach, InnoDB is able to avoid the creation of many shared row locks and hence is able to achieve better performance. The phantom read problem is also avoided. We also saw how the consistent read view works.

We discussed a particular performance problem in MySQL 5.5 and presented an optimization done in MySQL 5.6, which makes the read view creation faster and more scalable. In certain highly concurrent situations, the current design of read view does have some performance challenges and MySQL team at Oracle is actively working to find solutions.

For further reading refer to the Section “Consistent Nonlocking Reads” of the MySQL Reference Manual. Send your feedback to annamalai.gurusami@oracle.com.

MySQL Links