Optimizer Enhancements in MySQL 5.7

The MySQL optimizer is getting better. MySQL 5.6 introduced:

  • File sort optimizations with small limit
  • Index Condition Pushdown
  • Batched Key Access and Multi Range Read
  • Postponed Materialization
  • Improved Subquery execution
  • EXPLAIN for Insert, Update, and Delete
  • Optimizer Traces
  • Structured EXPLAIN in JSON format

This was in addition to the InnoDB storage engine now offering improved statistics collection, leading to more stable query plans.

In Evgeny Potemkin’s session at MySQL Connect titled ”MySQL’s EXPLAIN Command New Features”, two new features for 5.7 were announced. They are both incredibly useful, so I wanted to write a little about them.

EXPLAIN FOR CONNECTION

Normally with EXPLAIN, what you would be doing is finding the execution plan of a query you are intending to run, and then interpreting the output how you see fit.

What MySQL 5.7 will do, is give you the ability to see the execution plan of a running query in another connection. i.e.

EXPLAIN FORMAT=JSON FOR CONNECTION 2;

Why it’s useful:

  • Plans can change depending on input parameters. i.e. WHERE mydate BETWEEN '2013-01-01' and '2013-01-02' may use an index, but WHERE mydate BETWEEN '2001-01-01' and '2013-10-17' may not.
  • Plans can change as data changes.
  • Plans can also change depending on the context of a transaction, with InnoDB offering multi-version concurrency control.
  • Optimizer statistics can change, and it’s not impossible that the reason for the executing query being slow has something to do with it. It’s great to have conclusive proof and be able to rule this out.

Execution cost in EXPLAIN

MySQL uses cost based optimization to pick the best query execution plan when there are multiple choices available. It is very similar to how a GPS navigator adds up estimated time and picks the best route to a destination.

What this feature does is exposes the cost as a numeric value when running EXPLAIN FORMAT=JSON. To take an example using the world sample database:

mysql [localhost] {msandbox} (world) > EXPLAIN FORMAT=JSON SELECT City.* 
FROM City INNER JOIN Country ON City.countrycode=Country.code 
ORDER BY City.NAME ASC LIMIT 100\G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "4786.00"
    },
    "ordering_operation": {
      "using_temporary_table": true,
      "using_filesort": true,
      "cost_info": {
        "sort_cost": "2151.00"
      },
      "nested_loop": [
        {
          "table": {
            "table_name": "country",
            "access_type": "index",
            "possible_keys": [
              "PRIMARY"
            ],
            "key": "PRIMARY",
            "used_key_parts": [
              "Code"
            ],
            "key_length": "3",
            "rows_examined_per_scan": 239,
            "rows_produced_per_join": 239,
            "filtered": 100,
            "using_index": true,
            "cost_info": {
              "read_cost": "6.00",
              "eval_cost": "47.80",
              "prefix_cost": "53.80",
              "data_read_per_join": "61K"
            },
            "used_columns": [
              "Code"
            ]
          }
        },
        {
          "table": {
            "table_name": "City",
            "access_type": "ref",
            "possible_keys": [
              "CountryCode"
            ],
            "key": "CountryCode",
            "used_key_parts": [
              "CountryCode"
            ],
            "key_length": "3",
            "ref": [
              "world.country.Code"
            ],
            "rows_examined_per_scan": 9,
            "rows_produced_per_join": 2151,
            "filtered": 100,
            "cost_info": {
              "read_cost": "2151.00",
              "eval_cost": "430.20",
              "prefix_cost": "2635.00",
              "data_read_per_join": "151K"
            },
            "used_columns": [
              "ID",
              "Name",
              "CountryCode",
              "District",
              "Population"
            ]
          }
        }
      ]
    }
  }
}

Why it’s useful:

  • This exposes more transparency into optimizer decisions. DBAs can better understand what part of a query is considered expensive, and try to optimize. I think this is important, because I have heard a lot of DBAs make blanket recommendations like “joins are bad” or “sorting is bad”, but there needs to be context on how much data needs to be sorted. It makes us all speak the same language: estimated cost.
  • Cost refinement is an ongoing effort. As well as the introduction of new fast SSD storage, MySQL is introducing new optimizations (such as index-condition pushdown). Not all of these optimizations will be the best choice every time, and MySQL should ideally be able to make a right choice for all situations.