Shard-Query blog

The only open source MPP database engine for MySQL

Monthly Archives: April 2014

If you downloaded Shard-Query 2.5, please redownload to remove a PHP warning

There was some code in Shard-Query 2.5 that was not protected by an if() clause, and subsequently caused queries without a GROUP BY to generate a warning.  The warnings didn’t cause the test suite to fail and I missed them.   I updated the Shard-Query 2.5 binary so please redownload it if you get a warning about GROUP BY when not using GROUP BY in a query.

Shard-Query 2.5 is now released

Shard-Query 2.5 has been a long time coming, but the release is finally officially out the door.

There are numerous changes from the last major release including:

  •  Improved parser – fully handles complex expressions
  •  LOAD DATA INFILE support and S3 support
  •  Semi-join materialization for IN and NOT IN subqueries
  •  Improved support for subqueries in the FROM clause
  •  INSERT .. SELECT and CREATE TABLE .. SELECT support
  •  Ability to do range lookups on the shard key (IN/BETWEEN/etc)
  •  Improved proxy – supports SHOW commands too
  •  Support for all MySQL SELECT dialect including WITH ROLLUP
  •  Custom aggregate function support
  •  Asynchronous query support
  •  Numerous bug fixes

You can find it here

Shard-Query loader gets a facelift and now Amazon S3 support too

Shard-Query (source) now supports the MySQL “LOAD DATA INFILE” command.

When you use LOAD DATA LOCAL INFILE a single threaded load from the current process will be performed.  You can specify a path to a file anywhere readable by the PHP script.  This allows loading without using the Gearman workers and without using a shared filesystem.

If you do not specify LOCAL, then the Gearman based loader is used.  You must not specify a path to the file when you omit the LOCAL keyword.  This is because the shared path will the pre-pended to the filename automatically.  The shared path must be a shared or network filesystem (NFS,CIFS,etc) and the files to be loaded must be placed on the shared filesystem for the Gearman based loader to work.  This is because workers may run on multiple nodes and all workers have to be able to read from the files to be loaded.

S3 is supported as a source of data

Instead of using a shared filesystem, S3 is now supported too.  You must specify an AWS access key and secret key when setting up Shard-Query.  After those are set up, simply use LOAD DATA INFILE ‘s3://bucket/filename‘ to load from an S3 bucket using Gearman workers.  The file will be split up into smaller chunks efficiently and automatically, and each 16MB chunk will be loaded individually.

If you use LOAD DATA LOCAL INFILE ‘s3://bucket/filename’ then Gearman will not be used and the file will be loaded from the local process instead.

Important: When the Gearman loader is used (recommended) the S3 load will be split over the workers, each worker loading a 16MB chunk of the file in parallel.