NAME
Sort::Key::Merger - Perl extension for merging sorted things
SYNOPSIS
use Sort::Key::Merger qw(keymerger);
sub line_key_value {
# $_[0] is available as a scratchpad that persist
# between calls for the same $_;
unless (defined $_[0]) {
# so we use it to cache the file handle when we
# open a file on the first read
open $_[0], "<", $_
or croak "unable to open $_";
}
# don't get confused by this while loop, it's only
# used to ignore empty lines
my $fh = $_[0];
local $_; # break $_ aliasing;
while (<$fh>) {
next if /^\s*$/;
chomp;
if (my ($key, $value) = /^(\S+)\s+(.*)$/) {
return ($value, $key)
}
warn "bad line $_"
}
# signals the end of the data by returning an
# empty list
()
}
# create a merger object:
my $merger = keymerger { line_key_value } @ARGV;
# sort and write the values:
my $value;
while (defined($value=$merger->())) {
print "value: $value\n"
}
WARNING!!!
Several backward imcompatible changes has been introduced in version 0.10:
- filekeymerger callbacks are now called on list context
- order of return values on keymerger callback has changed
- in list context only the next value is returned by default
instead of all the remaining ones
DESCRIPTION
Sort::Key::Merger merges presorted collections of data based on some (calculated) keys.
Given
FUNCTIONS
The following functions are available from this module:
- keymerger { GENERATE_VALUE_KEY_PAIR($_) } @sources;
-
creates a merger object for the given
@sources
collections.Every item in
@source
is aliased by $_ and then the user defined subroutineGENERATE_VALUE_KEY_PAIR
called. The result from that callback should be a (value, key) pair. Keys are used to determine the order in which the values are sorted.GENERATE_VALUE_KEY_PAIR
can return an empty list to indicate that a source has become exhausted.The result from
keymerger
is another subroutine that works as a generator. It can be called as:my $next = $merger->(); my @next = $merger->($n);
In scalar context it returns the next value or undef if all the sources have been exhausted. In list context it returns the next $n values (1 is used as the deault value for $n).
If your data can contain undef values, you should iterate over the sorted values as follows:
my $merger = keymerger ...; while (my ($next) = $merger->()) { # do whatever with $next # ... }
Passing -1 makes the function return all the remaining values:
my @remaining = $merger->(-1);
NOTE: an additional argument is passed to the
GENERATE_VALUE_KEY_PAIR
callback in$_[0]
. It is to be used as a scrachpad, its value is associated to the current source and will perdure between calls from the same generator, i.e.:my $merger = keymerger { # use $_[0] to cache an open file handler: $_[0] or open $_[0], '<', $_ or croak "unable to open $_"; my $fh = $_[0]; local $_; while (<$fh>) { chomp; return $_ => $_; } (); } ('/tmp/foo', '/tmp/bar');
This function honours the
use locale
pragma. - nkeymerger { GENERATE_VALUE_KEY_PAIR($_) } @sources
-
is like
keymerger
but compares the keys numerically.This function honours the
use integer
pragma. - ikeymerger
-
Similar to
keymerger
but Compares the keys as integers. - ukeymerger
-
Compares the keys as unsigned integers.
- rkeymerger
- rnkeymerger
- rikeymerger
- rukeymerger
-
performs the sorting in reverse order.
- filekeymerger { generate_key } @files;
-
returns a merger subroutine that returns lines read from
@files
sorted by the keys thatgenerate_key
generates.@files
can contain file names or handles for already open files.generate_key
is called with the line just read on$_
and has to return the sorting key for it. If its return value isundef
the line is ignored.The line can be modified inside
generate_key
changing$_
, i.e.:my $merger = filekeymerger { chomp($_); # <== here return undef if /^\s*$/; substr($_, -1, 10) } @ARGV;
Finally,
$/
can be changed from its default value to read the files in chunks other than lines.The return value from this function is a subroutine reference that on successive calls returns the sorted elements in the same fashion as the iterator returned from
keymerger
.my $merger = filekeymerger { (split)[0] } @ARGV; while (my ($next) = $merger->(1)) { ... }
This function honours the
use locale
pragma. - nfilekeymerger { generate_key } @files;
-
is like
filekeymerger
but the keys are compared numerically.This function honours the
use integer
pragma. - ifilekeymerger
-
similar to filekeymerger bug compares the keys as integers.
- ufilekeymerger
-
similar to filekeymerger bug compares the keys as unsigned integers.
- rfilekeymerger
- rnfilekeymerger
- rifilekeymerger
- rufilekeymerger
-
perform the sorting in reverse order.
- multikeymerger { GENERATE_VALUE_KEYS_LIST($_) } \@types, @sources
-
This function generates a multikey merger.
GENERATE_VALUE_KEYS_LIST
should return a list with the next value from the source passed in$_
and the sorting keys.@types
is an array with the key sorting types (ee Sort::Key multikey sorting documentation for a discussion on the supported types).For instance:
my $merger = multikeymerger { my $v = shift $@_; my $name = $v->name; my $age = $v->age; ($v, $age, $name) } [qw(-integer string)], @data_sources; while (my ($next) = $merger->()) { print "$next\n"; }
SEE ALSO
Sort::Key, Sort::Key::External, locale, integer, perl core sort function.
COPYRIGHT AND LICENSE
Copyright (C) 2005, 2007 by Salvador Fandiño, <sfandino@yahoo.com>.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available.