CDB_File::Generator - generate massive sorted CDB files simply.
use CDB_File::Generator; $gen = new CDB::Generator "my.cdb"; $gen->("Fred", "Martha"); $gen->("Fred", "Olivia"); $gen->("Fred", "Jenny"); $gen->("Roger", "Joe"); $gen->("Roger", "Jenny"); $gen = undef; use CDB_File;
This is a class which makes generating sorted large (much bigger than memory, but the speed will depend on the efficiency of your sort command. If you haven't got one, for example, it won't work at all.) CDB files on the fly very easy
The new function creates a generator for a given filename, optionally specifying where it sould put it's temporary files.
Adds a value to the CDB being created
This is not normally called by the user, but rather by the completion of the cdbfile being writen out and that block of the program being exited or by the program completing. When it us run, it calls the finish method which ends the CDB creation. See below.
Finish ends of the cdb creation. First it closes the output temporary file, then it sorts it to another file and finally it calls
cdbmake to complete the creation job.
In the current implementation this uses
sort -u and deletes repeats of the same key with the same value.
In order to increase database portability, by default all sorting is done in the 'C' locale, even if the current program is working in another locale. This is "the right thing" in many cases. Where you are dealing with real word keys it won't be the right thing. In this case, use the locale function to set the locale.
If you decide not to create the CDB file you were creating, you have to call this method. Otherwise, it will be created as your program exits (or possibly earlier)
This is a little utility function which formats a cdbmake input line.
We use the external programs
cdbmake. These almost certainly improve our performance on large databases (and those are all we care about), but they make portability difficult.. Possibly system independent alternatives should be written and used where needed.
We should write out to the sort file with some encoding that gets rid of new lines and then read back, de-coding that to feed it to cdbmake..