AC::MrGamoo::FileList - get list of files
emacs /myperldir/Local/MrGamoo/FileList.pm copy. paste. edit. use lib '/myperldir'; my $m = AC::MrGamoo::D->new( class_filelist => 'Local::MrGamoo::FileList', );
You can fire up the system, and get the servers talking to each other, and perform some limited tests without this file.
But you must provide this file in order to actually run map/reduce jobs.
MrGamoo only runs map/reduce jobs. It is up to you to get the files on to the servers and keep track of where they are. And to tell MrGamoo.
Some people keep the file meta-information in a sql database. Some people keep the file meta-information in a yenta map. Some people keep the file meta-information in the filesystem.
When a new job starts, your
get_file_list function will be called with the job config, and should return an arrayref of matching files along with meta-info.
Each element of the returned arrayref should be a hashref containing at least the following fields:
the name of the file, relative to the
basedir in your config file.
filename => 'www/2010/01/17/23/5943_prod_5x2N5qyerdeddsNi'
an arrayref of servers where this file is located. the locations should be the persistent-ids of the servers (see MySelf).
if the same file is replicated on multiple servers, mrgamoo will be able to both intelligently determine which servers will process which files, as well as recover from failures.
location => [ 'email@example.com', 'firstname.lastname@example.org' ]
this should be the size of the file, in bytes. mrgamoo will consider the sizes of files in determining which servers will process which files.
size => 10843
none. you write this yourself.