Showing entries 1 to 2
Displaying posts with tag: bulk insert (reset)
Bulk insert into tables in sorted order to avoid deadlocks

Shard-Query inserts data into a “coordinator” table when answering queries.   When there is a GROUP BY on the original query, the coordinator table contains a UNIQUE KEY over the GROUP BY attributes.   Shard-Query uses INSERT .. ON DUPLICATE KEY UPDATE in combination with bulk insert (insert into … values (),(),() ) when inserting into the table.

For what would normally be efficiency sake, Shard-Query sends queries to the shards using ORDER BY NULL which disables the filesort operation. Of course, this often results in the rows being sent back from the shards in random order.

Because the results are in random order, the bulk insertion that the worker does into the coordinator table can deadlock with other worker threads when using InnoDB or TokuDB as the coordinator table. Right now I’ve just been using MyISAM for the coordinator table, which serializes queries at the bulk insert stage.  Having to insert the …

[Read more]
Speeding Up TPCC Table Loads by 8x with TokuDB v5.0

Percona’s TPCC for MySQL toolset allows one to measure the query performance for an OLTP workload on various MySQL storage engines.  The toolset includes a program to load the database tables, and a program to run queries and measure performance.  We have found Percona’s TPCC toolset to be extremely useful for tuning our software.  However, we want to take advantage of TokuDB’s bulk load capability when loading the database.

We created a new tool, a simple variant of the existing code, that generates CSV files for the TPCC database.  These CSV files can be bulk loaded into TokuDB with a “LOAD DATA INFILE” statement. TokuDB’s bulk loader uses a parallel merge sort algorithm that is implemented in CILK, an extension to the C language that …

[Read more]
Showing entries 1 to 2