Shard-Query is now much faster for some aggregate functions
February 13, 2014
Posted by on
I checked in some improvements to Shard-Query.
Now STD, STD_SAMP, VAR and VAR_SAMP can be orders of magnitude faster for large data sets. This is because they are now distributed like COUNT, AVG and other fully distributable MySQL aggregate functions. Prior to this change, my test query would create a 22GB (530M row) temporary table. It now creates a one row temporary table. This reduces network traffic, temporary storage space and increases performance.
Shard-Query better reports initialization errors. This mostly means that if you don’t have gearmand running you will no longer get a cryptic PHP output, but a nice error message.
You can change the storage engine for the repo by changing only one line in shard_query.sql. This lets you more easily install on Infobright, which needs MyISAM tables, not InnoDB tables in the repo.