In my previous rounds with DataWarehouse oriented engines I used single table without joins, and with small (as for DW) datasize (see http://www.mysqlperformanceblog.com/2009/10/02/analyzing-air-traffic-performance-with-infobright-and-monetdb/, http://www.mysqlperformanceblog.com/2009/10/26/air-traffic-queries-in-luciddb/, http://www.mysqlperformanceblog.com/2009/11/02/air-traffic-queries-in-infinidb-early-alpha/). Addressing these issues, I took Star Schema Benchmark, which is TPC-H modification, and tried run queries against InfoBright, InfiniDB, LucidDB and MonetDB. I did not get results for MonetDB, will explain later why. Again primary goal for test was not to get just numbers, but understand specifics of each engine and their ability to handle amount of data and execute queries.
All details I have are available on our Wiki http://www.percona.com/docs/wiki/benchmark:ssb:start and the specification of benchmarks you can get there http://www.percona.com/docs/wiki/_media/benchmark:ssb:starschemab.pdf.
I generated data with scale factor = 1000, which gave me 610GB of data in raw format and loaded into each engine.
There difference in engines gets into play. While InfoBright and
InfiniDB does not need indexes at all (you actually can't create
indexes here), they needed for LucidDB and MonetDB, and it
changes load time and datasize after load significantly. The
numbers
I put in results do not include indexing time, but it also should
be considered.
And indexes are exactly reason why I could not get results for
MonetDB, there I faced issue
I was not prepared for. MonetDB requires that index fits into
memory during indexing procedure, and for 610GB the index may get
to 120GB size, and I did not have that amount of memory ( the box
is only 16GB of RAM). MonetDB experts recommended me to
extended
swap partition to 128GB, but my partitions layout was not really
prepared for, I just did not expect I need big swap partition.
Loading
So load time.
InfiniDB can really utilize all available cores/cpus in systems (
I run benchmark on 8 cores box), and it allowed to load data
faster than other engines. Though LucidDB and MonetDB
are also have multi-thread loaders, only InfoBright ICE used
single core.
InfiniDB: 24 010 sec
MonetDB: 42 608 sec (without indexes)
InfoBright: 51 779 sec
LucidDB: 140 736 sec (without indexes)
I should note that time to create indexes in LucidDB was also significant and exceeded loading time. Full report on indexes are available here http://www.percona.com/docs/wiki/benchmark:ssb:luciddb:start
Data size
Size after load is also interesting factor. InfoBright is
traditionally good with compression,
though compression rate is less than in case with AirTraffic
table. I was told this is because
lineorder table comes not in sorted order, which one would expect
in real life. Actually
the same complain I heard from InfiniDB experts - if put
lineorder data in sorted order, loading
time can decrease significantly.
Datasize after load:
InfoBright: 112G
LucidDB: 120GB (without indexes)
InfiniDB: 626GB
MonetDB: 650GB (without indexes)
Queries time
Now on queries time.
Full results you can find on page http://www.percona.com/docs/wiki/benchmark:ssb:start,
and graph is below. There couple comments from me.
InfoBright was fully 1 CPU bound during all queries. I think the
problem
that engine can use only single cpu/core is getting significant
limitation
for them. For query 3.1 I got the surprising result, after 36h of
work I got
error that query can't be resolved by InfoBright optimizer and I
need
to enable MySQL optimizer.
InfiniDB is otherwise was IO-bound, and processed data fully
utilizing
sequential reads and reading data with speed 120MB/s. I think it
allowed
InfiniDB to get the best time in the most queries.
LucidDB on this stage is also can utilize only singe thread with
results sometime better,
sometime worse than InfoBright.
Results:
Query | InfoBright | InfiniDB | LucidDB |
---|---|---|---|
Q1.1 | 48 min 21.67 sec (2901.67 sec) | 24 min 26.05 sec (1466.05 sec) | 3503.792 sec |
Q1.2 | 44 min 55.37 sec (2695.37 sec) | 24 min 25.83 sec (1465.83 sec) | 2889.903 sec |
Q1.3 | 45 min 53.49 sec (2753.49 sec) | 24 min 27.25 sec (1467.25 sec) | 2763.464 sec |
Q2.1 | 1 hour 54 min 27.74 sec (6867.74) | 19 min 44.35 sec (1184.35 sec) | 9694.534 sec |
Q2.2 | 1 hour 13 min 33.15 sec (4413.15) | 19 min 49.56 sec (1189.56 sec) | 9399.965 sec |
Q2.3 | 1 hour 8 min 23.41 sec (4103.41) | 19 min 52.27 sec (1192.25 sec) | 8875.349 sec |
Q3.1 | NA | 19 min 11.23 sec (1151.23 sec) | 16376.93 sec |
Q3.2 | 3 hours 30 min 17.64 sec (12617.64 sec) | 19 min 28.55 sec (1168.55 sec) | 5560.977 sec |
Q3.3 | 2 hours 58 min 18.87 sec (10698.87 sec) | 19 min 58.29 sec (1198.29 sec) | 2517.621 sec |
Q3.4 | 1 hour 41 min 41.29 sec (6101.29 sec) | 12 min 57.96 sec (777.96 sec) | 686.202 sec |
Q4.1 | 8 hours 53 min 52.55 sec (32032.55 sec) | 32 min 57.49 sec (1977.49 sec ) | 19843.213 sec |
Q4.2 | 5 hours 38 min 7.60 sec / 5 hours 36 min 35.69 sec (20195.69 sec) | 33 min 35.45 sec (2015.45 sec) | 15292.648 sec |
Q4.3 | 12 hours 58 min 4.27 sec (46684.27 sec) | 33 min 47.32 sec (2027.32 sec) | 7241.791 sec |
Graph with results (time in sec, less time is better)
Conclusions
- InfiniDB is doing just great using available CPU cores full IO bandwidth reading from disk. You can see more details on InfiniDB scalability on InfiniDB's blog http://infinidb.org/infinidb-blog/mysql-parallel-query-processing-of-ssb-queries-via-infinidb-.html
- SSB benchmark may be not good for InfoBright, the synthetic
nature of benchmark
does not allow InfoBright to show better results. But I hope InfoBright will be able to reuse multi-cores / multi-disks soon. - I'd like MonetDB is able to use disk to build indexes, not only rely on available memory
- Taking complains on SSB I am looking to get another more
realistic dataset and
compare bigger set of available DW solutions
Entry posted by Vadim | 30 comments