On learning InnoDB: A journey to the core

I’ve been using InnoDB for about a decade now, and up to now have understood it well enough to make it do what I wanted, most of the time. However in order to achieve some goals related to efficiency, I’ve found it necessary to take my understanding to the next level. Unfortunately, the InnoDB documentation was pretty lacking in clear explanations of InnoDB’s internal data structures. Reading the code turned out to be the only way to find the information I needed.

However I quickly found that the structures and their usage (and especially their inter-relationships) are way too complex to keep in your head just based on reading the code. Additionally it’s only really possible to hope you’ve understood the structure correctly just based on reading (and for me, there were a lot of misunderstandings along the way).

An approach I’ve long taken to understanding something that is complex and poorly documented is the following three steps:

  1. Read the existing documentation and the existing code, until a basic understanding is reached. Often there are serious misunderstandings or incorrect factorization at this step.
  2. Write my own implementation, even a very basic and broken one, preferably in a completely different language (which avoids the tendency to cut and paste anything). Revise my understanding based on what works and what doesn’t.
  3. Create new documentation and diagrams based on my new understanding. Refactor my implementation as necessary (the act of reviewing everything in order to document it often reveals incorrect factorizations). Correct documentation based on new understanding from refactoring code. Repeat until correct.

Implementing InnoDB’s on-disk data structures

I started the innodb_ruby project to implement InnoDB’s on-disk data structures in Ruby. I chose Ruby because it’s very flexible, extremely fast for prototyping, and it’s my current favorite language. Any language would do, however, and performance is not really an issue (although we don’t want it to be too atrociously slow as that will make testing iteratively annoying).

Once starting the project, I had very basic parsing of the FIL header (common to all page types in InnoDB) on 16KiB pages working within minutes. After a couple more hours I had implemented the INDEX page header and could answer very basic questions like how many records were in each index page—an immediately useful result.

I went ahead and implemented each of the other critical data structures that I needed one at a time and in the order I needed each of them to be able to go deeper into each level of understanding InnoDB’s storage. Davi also jumped in and wrote some of the hairy bits such as dealing with variable-width field types in records.

We now have a basically working read-only implementation of InnoDB’s major data structures.

Documenting InnoDB’s on-disk data structures

Once enough of InnoDB’s secrets had been uncovered that I felt I could start making diagrams without them being completely wrong, I set out to build clear and approachable diagrams for all major InnoDB on-disk data structures. I started the innodb_diagrams project, and I chose to build them in OmniGraffle.

At this point most of the on-disk storage format for tablespace files (ibdataX and *.ibd files) is documented for Barracuda format tables (with COMPACT records). Much still remains to build corresponding documents for Antelope format (with REDUNDANT records) and for InnoDB-compressed tables. The log file format also remains.

Making use of the code and diagrams

Now that we have working code that is useful for interactive demos, and diagrams which will make great supporting material, I intend to write a few posts about some of the more interesting and undocumented structures. Keep an eye out!