Creating JDBC Connections Doesn't Have To Be Slow (or "not the reason to be using a pool")

Hanging out in #mysql on freenode the other day, I overheard someone saying that the reason to use connection pools with MySQL is because JDBC connections are expensive to create. That is true out of the box, but mostly because the out of the box behavior of MySQL's JDBC driver is to be standards-compliant. If you know that your DBA and your developers aren't doing crazy things with the database (changing configurations without letting the developers know, going around the "standard" API calls to start/end transactions, etc), then you can get to the point where connection setup is no slower than any other API. Does this mean you shouldn't use a connection pool? NO! (more on this next week).

Here's an iterative overview of the changes made in configuration, and how they affect what queries the driver does on initialization.

First, asking the driver to connect with default configuration results in the following statements being issued for every connection:

SHOW VARIABLES WHERE Variable_name ='language' OR Variable_name = 'net_write_timeout' 
  OR Variable_name = 'interactive_timeout' 
  OR Variable_name = 'wait_timeout' 
  OR Variable_name = 'character_set_client' 
  OR Variable_name = 'character_set_connection' 
  OR Variable_name = 'character_set' 
  OR Variable_name = 'character_set_server' 
  OR Variable_name = 'tx_isolation' 
  OR Variable_name = 'transaction_isolation' 
  OR Variable_name = 'character_set_results' 
  OR Variable_name = 'timezone' 
  OR Variable_name = 'time_zone' 
  OR Variable_name = 'system_time_zone' 
  OR Variable_name = 'lower_case_table_names' 
  OR Variable_name = 'max_allowed_packet' 
  OR Variable_name = 'net_buffer_length' 
  OR Variable_name = 'sql_mode'
  OR Variable_name = 'query_cache_type' 
  OR Variable_name = 'query_cache_size' 
  OR Variable_name = 'init_connect'

SELECT @@session.auto_increment_increment
SHOW COLLATION
SET NAMES utf8
SET character_set_results = NULL
SET autocommit=1
SET sql_mode='STRICT_TRANS_TABLES'

WIth this out of the box configuration, my development rig is topping out around 115 connections/second.

Now, let's tell the driver that we're not going to change any of those variables above while the application is deployed, so that it may cache them by adding "cacheServerConfiguration=true" to the connection string. The first connection will take the same amount of time, but subsequent connections using the same URL in the same JVM will not issue the "SHOW VARIABLES..." statement. We now end up with the following SQL statements on all connections after the first one:

SET NAMES utf8
SET character_set_results = NULL
SET autocommit=1
SET sql_mode='STRICT_TRANS_TABLES'

Next, let's make sure that we've set 'character_set_client' and 'character_set_server' on mysql to match what we're requesting, which for this test was UTF8. Once that is done, the driver no longer has to issue "SET NAMES utf8":

SET character_set_results = NULL
SET autocommit=1
SET sql_mode='STRICT_TRANS_TABLES'

We can get rid of the "SET character_set_results = NULL" if we set the server to match what the JDBC driver is requesting. The default (NULL), is so that character set conversions on result sets is pushed out to the clients, rather than burning CPU at the mysql server, since Java can convert between all character sets MySQL supports on its own. In this example, we assume that the database data is always in UTF-8, so setting by setting 'character_set_results' in MySQL to "utf8", and adding "characterSetResults=UTF-8" to the connection string, we're now at:

SET autocommit=1
SET sql_mode='STRICT_TRANS_TABLES'

I generally prefer all applications to be using a transactional storage engine like InnoDB, and strict mode to keep the data clean and consistent and catch programmer errors, so by adding "sql_mode=STRICT_TRANS_TABLES" to the server's my.cnf, the JDBC driver only issues these statements on connect:

SET autocommit=1

Java developers shouldn't be doing end-runs around Connection.commit(), Connection.rollback() and Connection.setAutoCommit(boolean) by issuing those as regular statements, so in almost all cases it's then safe to add "useLocalSessionState=true" to the connection string, and you end up with zero statements issued by the JDBC driver for connection setup after the first to grab the server configuration to be cached.

After those few small tweaks to the connection string, we're now able to create and close 667 connections/second from a single thread on my old beater of a Macbook Pro to a MySQL server on the same box (this value might be somewhat lower if the connection actually has to go over the wire). For comparison, PHP-5.3.6 using mysqlnd and the following script can create and close 500 connections/second on the same machine:

<?php
date_default_timezone_set("America/Chicago");

$now = new DateTime;

for ($i = 0; $i diff($now);

echo $elapsed->format('%s seconds');

?>

Stay tuned next week, when we see why you still might want to use a connection pool, even though you can create connections more than fast enough :)

Want to know where to read up on all of these configuration parameters? Check out our fine manual at http://dev.mysql.com/doc/refman/5.5/en/connector-j-reference-configuration-properties.html