My old post (http://www.jroller.com/mmatthews/entry/speeding_up_batch_inserts_for) on the performance gains from batch rewritten statements gets regurgitated in various forums, and conference sessions, and often people miss the nuances of it.
Under the hood, what is happening with this feature is the following:
(1) The driver attempts to detect that the SQL being prepared is an INSERT. We (on purpose) don‘t ship a full-fledged parser in the driver, so this works 95% of the time. For the other 5%, you‘re out of luck unless you can simplify your query text.
(2) If the statement is an INSERT, the driver attempts to determine if it can be rewritten as a multi-value INSERT. From the code itself, the conditions are:
// Needs to be INSERT, can‘t have INSERT … SELECT or
// INSERT … ON DUPLICATE KEY UPDATE with an id=LAST_INSERT_ID(...)
If not, the driver will determine whether it is more cost-effective round-trip-wise to enable multistatements on the connection, and send the batch as chunks of semicolon-separated statements. This means that any kind of batch DML can benefit from this feature, just that INSERTs benefit more because of the way MySQL handles multi-value INSERTs.
(3) The driver will send “chunks” that are as close to max_allowed_packet size in bytes as possible, until the batch has been completely sent.
(4) The driver does not re-order the batch
(5) You can use Statement.getGeneratedKeys() with this feature
(6) You can use set*Stream() with this feature
(6) You can use server-side prepared statements with this feature as well, and if the batch contains streams (CLOBs or BLOBs), they will be streamed to the server, using less memory.
As you can see, this is almost a win-win feature to enable, especially if your application or ORM can do DML in batches (most do). The only reason it‘s not enabled by default is because of (1), maybe as this feature sees more use, we can feel safer enabling it by default.