Shard-Query blog

The only open source MPP database engine for MySQL

Bulk insert into tables in sorted order to avoid deadlocks

Shard-Query inserts data into a “coordinator” table when answering queries.   When there is a GROUP BY on the original query, the coordinator table contains a UNIQUE KEY over the GROUP BY attributes.   Shard-Query uses INSERT .. ON DUPLICATE KEY UPDATE in combination with bulk insert (insert into … values (),(),() ) when inserting into the table.

For what would normally be efficiency sake, Shard-Query sends queries to the shards using ORDER BY NULL which disables the filesort operation. Of course, this often results in the rows being sent back from the shards in random order.

Because the results are in random order, the bulk insertion that the worker does into the coordinator table can deadlock with other worker threads when using InnoDB or TokuDB as the coordinator table. Right now I’ve just been using MyISAM for the coordinator table, which serializes queries at the bulk insert stage.  Having to insert the rows more than once due to the deadlock is not efficient.

I am going to test removing ORDER BY NULL which will distribute the sort.  The deadlocks should go away and concurrency should go up. I’ll have to test it out and if it improves performance it will be included in beta 2 of Shard-Query 2.0.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 162 other followers

%d bloggers like this: