Shard-Query blog

The only open source MPP database engine for MySQL

Tag Archives: load data infile

Shard-Query loader gets a facelift and now Amazon S3 support too

Shard-Query (source) now supports the MySQL “LOAD DATA INFILE” command.

When you use LOAD DATA LOCAL INFILE a single threaded load from the current process will be performed.  You can specify a path to a file anywhere readable by the PHP script.  This allows loading without using the Gearman workers and without using a shared filesystem.

If you do not specify LOCAL, then the Gearman based loader is used.  You must not specify a path to the file when you omit the LOCAL keyword.  This is because the shared path will the pre-pended to the filename automatically.  The shared path must be a shared or network filesystem (NFS,CIFS,etc) and the files to be loaded must be placed on the shared filesystem for the Gearman based loader to work.  This is because workers may run on multiple nodes and all workers have to be able to read from the files to be loaded.

S3 is supported as a source of data

Instead of using a shared filesystem, S3 is now supported too.  You must specify an AWS access key and secret key when setting up Shard-Query.  After those are set up, simply use LOAD DATA INFILE ‘s3://bucket/filename‘ to load from an S3 bucket using Gearman workers.  The file will be split up into smaller chunks efficiently and automatically, and each 16MB chunk will be loaded individually.

If you use LOAD DATA LOCAL INFILE ‘s3://bucket/filename’ then Gearman will not be used and the file will be loaded from the local process instead.

Important: When the Gearman loader is used (recommended) the S3 load will be split over the workers, each worker loading a 16MB chunk of the file in parallel.