About MySQL 5.6

I am very excited and thrilled to use the latest release of MySQL 5.6 in production. This is probably the most notable and innovative release in many years, if not ever.

During the last year, we had the chance to work with many new features and test the fixes to old issues. To be honest, I was expecting to have MySQL 5.6 GA before now, and I even wagered with my colleague Francisco that it would be out before the end of 2012. It was nothing special, just a beer in the Santa Clara Hyatt lounge. Unfortunately for me, MySQL 5.6 is now in GA and given that it happened in 2013, I lost the bet and now have to pay for that beer. But I have also lost the full list of things that we saw as relevant, interesting, or really innovative for MySQL.

So I took a step back, took some time, and reviewed what Oracle delivered in this new MySQL release.

Short premise

Oracle developer teams did great work for MySQL 5.6, so we have to give Oracle credit for that. I have followed many discussions and sometimes arguments between Oracle and others about who was doing what better, and as a final user, I was loving that. It’s not because I love to see people arguing, but because diversity and competition push actors to find new and better solutions. This is exactly what happened in the MySQL core area like the Optimizer, between Oracle and the MariaDB team.

But the work that headed in different areas like replication, InnoDB optimization, and performance schema were mainly coming from MySQL/Oracle developers. It is true that there were others pushing, like Codership Galera for semi-syncronous replication, but this solution is a complementary alternative to standard MySQL replication.

So my conclusion there is kudos to the people who did the work. They are often the ones who never speak at conferences or get publicly acknowledged.

Thank you to all of you. GREAT JOB!

It is better to be beautiful than to be good, but it is better to be good than to be ugly. (Oscar Wilde)

Back to what has been done in MySQL 5.6. I would like to split my comments into two sections: 1) what has been well advertised and correctly proposed as flagship, and 2) what is less evident, but will have an impact on our everyday lives.

So what do we know about MySQL 5.6?

Following the MySQL documentation, the first topic is “Better performance and scalability” (but only for InnoDB).

In short, they optimized a few areas of contention like:

  • Redesigning the kernel mutex split to remove a single point of contention
  • Implementing a different thread for flushing operations
  • Implementing multi-threaded purges
  • Implementing adaptive hashing algorithms
  • Less buffer pool contention
  • Improving consistent behavior in query execution via persistent optimizer statistics that are collected at more regular, predictable intervals

An additional change in InnoDB is how the “read only” requests will be executed. In fact, read-only transactions have now been optimized, so there won’t be the overhead for versioning control and requests will bypass the transaction handling. In short, InnoDB will act like MyISAM. A simple way to do it:


SET autocommit = 0;
START_TRANSACTION_READ_ONLY;
SELECT c FROM T1 WHERE id=N;
COMMIT

InnoDB will now have a few features that were present in Percona Xtradb and allow better fine tuning for SSD usage. In fact, we now have the possibility to easily change the page dimensions to better fit the storage algorithm. More undo log is now configurable, and as such, we will be able to place it on the highest performing storage device(s), which will help a lot in cases of high levels of transactions/second. Another area where MySQL 5.6 has a lot of improvements, and a lot of discussions, is the query handling/optimizer. A lot of virtual ink has been spilled to determine who did this or that first.

As the final consumers, I think that the most important thing is that the functionalities are now there. Sub-query optimization and semi-join are now available and customizable (manual section). Condition push down and Batched Key Access Joins are going to provide huge improvements, and we must now give special attention to configuring the join_buffer size, given that BKA join algorithm employs a join buffer to accumulate the interesting columns of the rows produced by the first operand of the join operation (manual section).

In Innodb Plugin in 5.1 and 5.5, we have seen the introduction of the Fast Creation Index, which allowed us to create secondary indexes without duplicating the table. In MySQL 5.6, additional DDL operations can be done without even stopping inserts or updates, and the list of actions that can be performed without affecting the DML operations is quite long. Nevertheless, something that seems harmless is instead disruptive. Operations like changing data type of a column, if not in primary key, will cause the table rebuild and lock any DML. Changing a character set will do the same. So it would be a very good idea to memorize that list to prevent possible mistakes.

Some side effects are still present when ONLINE changes apply, so the data is still written in a physical location, like the temp dir, which needs to be dimensioned correctly to prevent out of space. Or if using ALTER TABLE .. LOCK=NONE, there will be a period of time at the start of the operation that will require an EXCLUSIVE LOCK, which means that the DDL modification will remain on hold waiting for a long transaction to finish or other operations like SELECT .. FOR UPDATE.

Finally, InnoDB has full text search. How many times would we have to manage a situation where we would have to remove MyISAM if not for full text search? Many!

Now that we have it, is it solid enough? Is it fast enough? The community is still debating this. I have NOT personally tested it yet, but I will soon, given it is in my road map. For the moment, we have some good statistics from Dimitri Kravtchuk .

Honestly, we need to recognize that this is a new feature, so it is subject to many possible bugs and unexpected bad behaviour. So test it, but be very conservative before using it in production. In short, wait and use in test; do not use in production yet.

Replication is another area of major recoding, not only because of the introduction of the Global Transaction Identifier per se, but because of the whole mechanism behind it. Implementing replication with the new Coordinator/Workers concept was not an easy task and deserves a better discussion and description.

For the moment, I will limit myself to saying that my review of the initial design and implementation shows that many third-party solutions will no longer be valid. This is very important because if we can have the right functionalities in the core, we will not have to build architecture/solutions using additional components, which will decrease the percentage of possible failures.

The new available functionalities that I consider the most relevant are:

  • GTID – This will allow all of us to track the transaction from master to last slave in the chain consistently.
  • Parallel replication on the slave – This is still quite far from what we have in Galera Codership solution, but it is a step forward and reduces the bottleneck of a single thread. It is important to understand that this only works for different schemas, so you can “parallelize” as long as you design your data accordingly.
    I see this as wrong bounding, and I hope that the replication team at Oracle will be able to fix it in the next release.
    Slave will now store the last binary log position successfully, which will allow the Slave to perform an automatic rollback in case of slave crash and restart replicating from the correct position as well. I hope that this will work as expected. We could finally get back so many lost hours of sleep.
  • Row replication – this is already supposed to reduce the amount of data written on the binary log. The row replication will still contain the “image” of the row before and after, which is what variable binlog-row-image is supposed to help manage. In short, only the copy of the modified row is present or the value before it, which reduces the data written.
  • Drums roll please… Replication checksum at your service, finally! (binlog-checksum={NONE|CRC32}; master-verify-checksum={0|1}; slave-sql-verify-checksum={0|1}). Slave will check the statement before applying it, which should reduce the possibility of a corrupt relay log. But there is a possible issue with mixing New Master using checksum with an old slave that isn’t. So the golden rule is that the slave should always be the same or newer than the master. As a final note, mysqlbinlog is now modified to manage the checksum as well.
  • Do you remember the trick to have a slave delay of X seconds/minutes/hours? Archive it. Replication delay is now a fact in the core as well (manual section).
  • Portable table space – I already wrote about this in two blogs before 5.6 was released. I did several tests and have tested ways to manage them in parallel with partitions and backup/recovery procedures. You can read them at Article1 and Article2.

There is a very simple feature that I am very happy to have in 5.6. MySQL can now set a password expiration and define validation rules. This means that Admins can set and manage the password for interactive users, which will enforce security.

If a user connects to MySQL, they will get :

shell> mysql -u myuser -p
Password: ******
ERROR 1862 (HY000): Your password has expired. To log in you must
change it using a client that supports expired passwords.

This means that users need to change their passwords, but they must KNOW which client to use to do it. From the latest information available for: “MYSQL_OPT_CAN_HANDLE_EXPIRED_PASSWORDS is enabled for mysqltest unconditionally, for mysql in interactive mode, and for mysqladmin if the first command is password.”

TRAP:  Remember to not include application users or your application will stop working!!

Other security improvements are:

  • Password in master.info – MySQL 5.6 extends the replication START SLAVE command to enable DBAs to specify master users and passwords to authenticate the account used to connect using the authentication plugin.
  • New encryption for passwords in general query logs, slow query logs, and binary logs. Passwords in statements written to these logs are no longer recorded in plain text.
  • New password hashing is now done using the SHA-256 password hashing algorithm using a random salt value.
  • New options for passwords on the command line – MySQL 5.6 introduces a new “scrambled” option/config file (.mylogin.cnf) that can be used to securely store user passwords that are used for command line operations.

Other areas of major improvement that will see a lot of work from all of us are the Performance Schema and Innodb information tables.

Both information tools will give us a better understanding and insight into what is happening inside MySQL. Some of the new instruments include:

Statements/Stages

What are my most resource intensive queries? Where do they spend time?

Table/Index I/O, Table Locks

Which application tables/indexes cause the most load or contention?

Users/Hosts/Accounts

Which application users, hosts, and accounts are consuming the most resources?

Network I/O

What is the network load like? How long do sessions idle?

Summaries

Aggregated statistics grouped by statement, thread, user, host, account, or object.

What needs to be noted is that we have recently seen performance degradation over 15% on production systems. This underlines the fact that information collection should be done only when needed and not kept on all the time.

After so many discussions and polemics, MySQL 5.6 has microsecond as well. This precision allows you to define in the table a time_type(6) attribute. One reason more to stay on the mainstream.

So far I have mentioned the more well known new features, but I have found some very interesting features that may be considered minor but that I think are relevant and will facilitate our work in one way or another.

They are:

  • Explicit Partition selection. Yes! You can do it. SELECT * FROM t1 PARTITION(p0,p1) WHERE … I hope you see the unbelievable value here. Having the option to declare what you want from where you want it is a huge step in reducing the load on the server and will dramatically reduce the locks.
  • Extend mysqlbinlog to support remote backup of binary logs. This feature will change the way we manage backups.
  • After Signal was introduced in 5.5, and store procedure can now also call GET DIAGNOSTICS.
    This provides a mechanism for (a) an application to ask questions such as “Did anything go wrong when the last SQL statement was executed and, if so, what went wrong?” and (b) getting a reply from the server in a standardized way.
    That is, GET DIAGNOSTICS is used to obtain information about the previously executed SQL statement.

So what?

The list above is not complete, and if you check the work log, you will see details on how many features have been implemented.
Nevertheless, it gives us an idea of the effort that has been taken. Many features were discussed or “initialized” years ago, and some WL entries are really old (2005) or older. What is relevant here is that MySQL5.6 is a release that has moved a lot of “pending” desired features into production, substantially reducing, if not completely removing, any reason to use patched version stream.

This version is interesting because it has seriously improved areas like:

  • Security and audit features;
  • Performance on InnoDB internal operations, thread handling, and SQL execution;
  • Scalability, which is more robust and improves replication. It may not be perfect, but we are moving in the right direction. I am expecting further improvement on replication consistency check and replication performance.

For us DBAs who have to work with it, MySQL 5.6 offers a lot of additional information and tools that we have been asking for and dreaming of for years. Oracle is obviously in the position to do this and more. If not Oracle, who has the “power” to put so much effort into making the product better?

Let’s be honest. We have other actors proposing good ideas and doing some good development work here and there. For me, these people are crucial in keeping the “biodiversity” alive and helping the community participate in the process. But when we have to consolidate everything in a single, consistent way that is stable, solid, and maintainable, the MySQL mainstream Oracle team is and will remain the real referring point.

 

Conclusion

MySQL 4.1 is still out there, and we also have customers that have recently migrated from MySQL 3.x. to 5.0. Comparing these versions of MySQL with MySQL 5.6 is like comparing a knob stick to a light saber. However, many enterprises see MySQL as a good platform for their relational data either in conjunction with Oracle or on its own. Our task is to help all of our clients move out of the stone age to a trustworthy, scalable, high performance version of MySQL that is more secure and easier to manage.

Oracle MySQL 5.6 is the one.