Earlier this week we pushed to Github the ninth iteration of
Twitter MySQL. Here are some of the highlights.
Bug#67718: InnoDB drastically under-fills pages in
certain conditions [48996cad34] InnoDB's B+ tree page split
algorithm that attempts to optimize for sequential inserts
might end up causing poor space utilization depending on the
distribution pattern of index key values throughout the index.
For example, if an insert that causes a page to be split is
inserting a key value that is an immediate successor or
predecessor to the last inserted key value in the same page,
the insertion point is used as the split point irrespective of
the actual distribution of values in the page.
The solution is to use the standard B+ tree split algorithm while still preserving some form of optimization for sequential inserts. When a page needs to be split, the median key value in a page is used as the split point so that the data is distributed in a more symmetric fashion. A new variable named innodb_index_page_split_mode is introduced to provide a way to control the page split behavior.
Bug#67963: InnoDB wastes 62 out of every 16384
pages [400f4f3bf4] Once the segments (indexes) of a
tablespace are bigger than 32 pages, fragment pages are no
longer allocated for use, yet they are still reserved whenever
a new fragment extent is allocated (usually every 16384 pages).
This is a limitation due to the fact that a segment can only
allocate up to 32 fragment pages since the array used to track
fragment pages belonging to a segment is limited to 32
The solution is to allow for fragment extents to be leased to segments whenever there are free fragment extents available. A fragment extent is considered available if the only used pages in the extent are the extent descriptor and ibuf bitmap pages. A new extent state is used to tag leased extents and to ensure that they are returned to the space free fragment list once no longer being used by a segment.
See Page management in InnoDB space files for an in-depth description of extents, extent descriptors and fragments.
Bug#68023: InnoDB reserves an excessive amount of
disk space for write operations [52c63a6cbe] When performing operations that are
expected to expand a table (for example, allocate new pages due
to a page being split), InnoDB currently preallocates and
reserves up to 1% of the total size of the tablespace as a
measure to ensure that enough free extents (that is, disk
space) are available for the operation and to ensure that if
running out of disk space, these operations are preemptively
failed as to reserve any remaining free space to operations
that end up freeing space (that is, delete data).
The percentage is reasonable for tables smaller than a few gigabytes, but not for tables sized at tens of gigabytes or more, at which point the percentage won't correctly estimate the free space needed to perform operations and may cause an excessive amount of free extents to be preallocated.
This change introduces two new system variables to enable/disable free extents reservation and to control the amount of free extents that is reserved for such operations. The variable innodb_reserve_free_extents can be used to enable or disable free extents reservation and innodb_free_extents_reservation_factor can be used to control what percentage of a space size is reserved for operations that may cause more space to be used.
Functionality Added or Changed
Counters for successful page merges and page
discards. Currently Innodb_page_merges counts only
merge attempts but there is no metric for successful merges.
This change introduces a new status variable named Innodb_page_merges_succeeded which
indicates the number of successful page merge operations (that
is, the number of pages successfully merged into another
Additionally, this change also introduces a new status variable named Innodb_page_discards which represents the number of pages that have become empty and are thus discarded.
- Support for floating-point system variables using the plugin interface. Augment the server plugin interface to allow plugins to define and expose floating-point system variables of type double. The convenience macros MYSQL_SYSVAR_DOUBLE and MYSQL_THDVAR_DOUBLE are introduced and can be used by plugins to declare system variables of type double.
The fractional part of the def, min and max
values of system variables is ignored. Since the
command-line option parsing interface (my_getopt) uses fields
of type unsigned long long to store these values, the
double values were being stored in a lossy way that
discards the fractional part.
This change allows the default, minimum and maximum values of system variables of type double to have a meaningful fractional part by to storing the raw representation of a double value in the raw bits of an unsigned long long field in a way that the binary representation remains the same. Hence, the actual value can be passed back and forth between the types.