Test driving the Spider storage engine - sharding for the masses


At the MySQL Conference 2009 I attended a session about the Spider storage engine, an engine with built-in sharding features.
The talk was notable for the speaker wearing a spiderman costume, and for some language barrier that made the talk less enjoyable than it should be. That's a pity, because the engine is very intriguing, and deserves some exploration.


What is the Spider engine, then? In short, it's an extension to the partitioning engine with the ability of connecting to remote servers. Basically, partitions + federated, except that Federated is explicitly removed during the compilation. Additionally, the spider engine should remove current limitations, such as single thread for replication and single source replication, but due to lack of specific documentation, I will limit my current experiment to the sharding features.
The only documentation available is the slide deck from the presentation, and some very reference parameters that come with the source code. I show here what I found by practical inspection.
building the engineTo compile the engine, you need the source code for MySQL 5.1.31 (as required by the documentation, but it works fine with later versions as well).
Download the source code from the launchpad repository and expand it. You will get a ./spider directory, which you need to move under the ./storage directory in the source. Then you compile, with these instructions:

autoconf
automake
./configure \
--prefix=/usr/local/mysql \
--localstatedir=/usr/local/mysql/data \
--libexecdir=/usr/local/mysql/bin \
--enable-thread-safe-client \
--enable-local-infile --with-pic \
--with-fast-mutexes \
--with-client-ldflags=-static \
--with-mysqld-ldflags=-static \
--with-zlib-dir=bundled \
--with-big-tables --with-ssl \
--with-readline --with-embedded-server \
--with-partition --without-innodb \
--without-ndbcluster \
--without-archive-storage-engine \
--without-blackhole-storage-engine \
--with-csv-storage-engine \
--without-example-storage-engine \
--without-federated-storage-engine \
--with-extra-charsets=complex
make
./scripts/make_binary_distribution


Now we will use the MySQL Sandbox to create a testing environment. Let's start with a simple case: 1 main server and 4 remote ones.

make_sandbox $PWD/mysql-5.1.31-osx10.5-i386.tar.gz


This creates a sandbox under $HOME/sandboxes/msb_5_1_31, which is our main server. Before using it, we need to create some additional tables and load the plugin. (These queries are in the spider instructions, but they are hard to cut and paste. This is much easier for that purpose.)


create table if not exists mysql.spider_xa(
format_id int not null default 0,
gtrid_length int not null default 0,
bqual_length int not null default 0,
data char(128) not null default '',
status char(8) not null default '',
primary key (data, format_id, gtrid_length),
key idx1 (status)
) engine=MyISAM default charset=utf8 collate=utf8_bin;
create table if not exists mysql.spider_xa_member(
format_id int not null default 0,
gtrid_length int not null default 0,
bqual_length int not null default 0,
data char(128) not null default '',
scheme char(64) not null default '',
host char(64) not null default '',
port char(5) not null default '',
socket char(64) not null default '',
username char(64) not null default '',
password char(64) not null default '',
primary key (data, format_id, gtrid_length, host, port, socket)
) engine=MyISAM default charset=utf8 collate=utf8_bin;
create table if not exists mysql.spider_tables(
db_name char(64) not null default '',
table_name char(64) not null default '',
priority bigint not null default 0,
server char(64) default null,
scheme char(64) default null,
host char(64) default null,
port char(5) default null,
socket char(64) default null,
username char(64) default null,
password char(64) default null,
tgt_db_name char(64) default null,
tgt_table_name char(64) default null,
primary key (db_name, table_name),
key idx1 (priority)
) engine=MyISAM default charset=utf8 collate=utf8_bin;

install plugin spider soname 'ha_spider.so';
select engine,support,transactions,xa
from information_schema.engines;
+------------+---------+--------------+------+
| engine | support | transactions | xa |
+------------+---------+--------------+------+
| SPIDER | YES | YES | YES |
| MRG_MYISAM | YES | NO | NO |
| CSV | YES | NO | NO |
| MyISAM | DEFAULT | NO | NO |
| MEMORY | YES | NO | NO |
+------------+---------+--------------+------+


Spider is now enabled
preparing the remote servers
The servers used as remote shards can be conveniently replaced, for this experiment, by sandboxes. Let's create three of them, to illustrate the concept.

make_multiple_sandbox --group_directory=spider_dir \
--sandbox_base_port=6000 --check_base_port 5.1.31


Now we have three sandboxes under $HOME/sandboxes/spider_dir, with ports ranging from 6001 to 6003.
What we need to do is to create, in each server, a table with the same structure as the one that is being sharded in the main server.

$ cd $HOME/sandboxes/spider_dir
$ cat tablea.sql
drop schema if exists myspider;
create schema myspider;
use myspider;

Create table tbl_a(
col_a int,
col_b int,
primary key(col_a)
);

$ ./use_all "source tablea.sql"


The main server
Finally, we have all the components in place, we can create the table for the main server.

drop schema if exists   myspider;
create schema myspider;
use myspider;

Create table tbl_a(
col_a int,
col_b int,
primary key(col_a)
) engine = Spider
Connection ' table "tbl_a", user "msandbox", password "msandbox" '
partition by range( col_a )
(
partition pt1 values less than (1000)
comment 'host "127.0.0.1", port "6001"',
partition pt2 values less than (2000)
comment 'host "127.0.0.1", port "6002"',
partition pt3 values less than (MAXVALUE)
comment 'host "127.0.0.1", port "6003"'
);


Compared to classic partitions, there is some new ground to cover. The "CONNECTION" clause is used to define the table name in the remote server. The schema name is assumed to be the same as the one in the main server.
For each partition, we can add a "COMMENT" clause, with the connection parameters. Since we are using sandboxes in the same host, we connect to 127.0.0.1, and use the port corresponding to each sandbox.
From now on, we can use this table almost transparently.
Using the spider engine

# on the main server
./use myspider

insert into tbl_a values (1,1);
Query OK, 1 row affected (0.00 sec)

insert into tbl_a values (1001,2);
Query OK, 1 row affected (0.01 sec)

insert into tbl_a values (2001,3);
Query OK, 1 row affected (0.00 sec)

select * from tbl_a;
+-------+-------+
| col_a | col_b |
+-------+-------+
| 1 | 1 |
| 1001 | 2 |
| 2001 | 3 |
+-------+-------+
3 rows in set (0.01 sec)


So far, so good. No surprises, at least in the surface.
Now, where is the data? A close inspection to the files in the data directory shows that the data is not in the main server.
The data is stored in the "remote" servers, as we can check easily:


# in the spider_dir path
./use_all "select * from myspider.tbl_a"

# server: 1:
col_a col_b
1 1
# server: 2:
col_a col_b
1001 2
# server: 3:
col_a col_b
2001 3


Now, let's apply some curiosity. What happens in the remote server when I insert a row in the main server? Probably the general log can give me an answer.

# spider_dir
./use_all 'set global general_log=1'

# main server
insert into tbl_a values (2002,3);
Query OK, 1 row affected (0.00 sec)

# spider_dir
$ tail node3/data/mysql_sandbox6003.log
090429 17:27:28 299 Connect msandbox@localhost on
090429 17:27:42 299 Query set session sql_log_off = 1

Ah! No curious people allowed.
Well. This can stop a common user, but not a determined one.
MySQL Proxy to the rescue! There is a Lua script that handles logs.

./node2/proxy_start --proxy-lua-script=$PDW/logs.lua --log-level=warning


Let's change the main table definition:

...
partition pt3 values less than (MAXVALUE)
comment 'host "127.0.0.1", port "4040"'


And now we can see what happens.

# main server
insert into tbl_a values (2004,3);
Query OK, 1 row affected (0.00 sec)

#tail /tmp/mysql/log
2009-04-29 18:03:04 305 -- set session sql_log_off = 1 >{0}
2009-04-29 18:03:04 305 -- set session autocommit = 1 >{0}
2009-04-29 18:03:04 305 -- start transaction >{0}
2009-04-29 18:03:04 305 -- set session transaction isolation level repeatable read >{0}
2009-04-29 18:03:04 305 -- insert into `myspider`.`tbl_a`(`col_a`,`col_b`)values(2004,3) >{0}
2009-04-29 18:03:04 305 -- commit >{0}


Hmmm. I don't like the sight of it. autocommit=1 and then start transaction, set session transaction and commit. At the very least, it's wasting three queries. This needs some explanation from the author, I guess. Let's try some data retrieval.

# main server
select * from tbl_a;
+-------+-------+
| col_a | col_b |
+-------+-------+
| 1 | 1 |
| 1001 | 2 |
| 2003 | 3 |
| 2004 | 3 |
| 1001 | 2 |
| 2003 | 3 |
| 2004 | 3 |
+-------+-------+
7 rows in set (0.01 sec)

$tail /tmp/mysql.log
2009-04-29 18:01:07 303 -- set session sql_log_off = 1 >{0}
2009-04-29 18:01:07 303 -- set session autocommit = 1 >{0}
2009-04-29 18:01:07 303 -- start transaction >{0}
2009-04-29 18:01:07 303 -- set session transaction isolation level repeatable read >{0}
2009-04-29 18:01:07 303 -- show table status from `myspider` like 'tbl_a' >{0}
2009-04-29 18:01:07 303 -- select `col_a`,`col_b` from `myspider`.`tbl_a` limit 0,9223372036854775807 >{0}
2009-04-29 18:01:07 303 -- commit >{0}


Scarier than the previous one. The LIMIT clause spells trouble.
And this latest experiment made me try something more ambitious.
I installed a group of 20 sandboxes and loaded the employees test database (4.2 million records), spreading two partitioned tables across the backend servers.
Performance is better than using a single table, but slower than a normal partitioned table on a single server. And I think I know why.

# main server
select count(*) from salaries where from_date between '1995-01-01' and '1995-12-31';
+----------+
| count(*) |
+----------+
| 201637 |
+----------+
1 row in set (0.76 sec)

$ tail /tmp/mysql.log
2009-04-29 18:09:51 307 -- set session sql_log_off = 1 >{0}
2009-04-29 18:09:51 307 -- set session autocommit = 1 >{0}
2009-04-29 18:09:51 307 -- start transaction >{0}
2009-04-29 18:09:51 307 -- set session transaction isolation level repeatable read >{0}
2009-04-29 18:09:51 307 -- show table status from `employees` like 'salaries' >{0}
2009-04-29 18:09:51 307 -- select `emp_no`,`from_date` from `employees`.`salaries` order by `emp_no`,`from_date` limit 0,9223372036854775807 >{0}


This is definitely a waste. It's a problem that is similar to what is happening when using the Federated engine. But here, we get a "ORDER BY" clause that is unnecessary to say the least.
Bugs
During the tests, I spot at least two serious bugs.
When you drop a partition, the data in the remote server is not removed. If you recreate the partition and insert something, you get a "duplicate key" error.

When you drop a database, the table mysql.spider_tables does not get updated, with the result that you can't recreate the table. with the same name, unless you remove the corresponding entries manually.

That was a lot of information for one session. Please try it and comment. Don't expect me to provide answers to the reasons of the spider engine. I hope the author chimes in and clarifies the muddy matter.