Iterator::Merger - an iterator to efficiently mergesort iterators
use Iterator::Merger qw(imerge); # $ite1 and $ite2 are iterators generating sorted output # These can be code references or filehandlers my $merger = imerge($ite1, $ite2); while (my $next = $merger->()) { print $next, "\n"; } # example with two files and a sub: open(my $fh1, '<', 'foo.sorted') or die $!; open(my $fh2, '<', 'bar.sorted') or die $!; my $sub = do { my $i = 0; sub { ++$i<100_000_000 && sprintf("%08d", $i) } }; my $merger = imerge($fh1, $fh2, $sub); while (my $next = $merger->()) { print $next, "\n"; }
Iterator::Merger generates the fastest possible iterator to merge sorted iterators (merge sort).
Iterator::Merger
Source iterators can be code references or filehandlers, and must produce an ascending order sorted output. The generated iterator is always a code reference.
See the IMPLEMENTATION section for details.
ITERATORS is a list of either CODE or GLOB references that must generate an ascending sorted output. This function returns a CODE reference that generates the sort-merged output of all ITERATORS. Sorting is done using the lt comparison operator.
lt
CODE iterators are evaluated in scalar context until undef is returned.
GLOB iterators are evaluated in scalar context, relying on the $/ separator, until EOF.
Same as imerge, but sorting is done using the < comparison operator.
imerge
<
Same as imerge, but no sorting is done, nor expected in source iterators. Iteratos are read in order until undef is returned or EOF is reached.
Nothing by default. Functions can be imported explicitly
use Iterator::Merger qw(imerge imerge_num imerge_raw);
imerge and imerge_num are very fast as the Perl comparison code is generated on the fly, but the price to pay is that any first call with a given number of iterators will take some time and memory to generate and evaluate the code. Subsequent calls with the same number of source iterators will reuse the same code.
imerge_num
The cost of this overhead has to be evaluated in comparison to the number of sorted elements to merge.
The size of the generated code doubles for each additional source iterator, up to a limit.
For example the comparison code for 9 iterators represents about 15KiB of minified Perl code.
When the number of source iterators exceeds 9, imerge and imerge_num resort to using the very fast Array::Heap module, when present. This limit was empirically chosen as a safe point where using a heap is almost as efficient as using "brute-force" generated comparison, without any generation overhead.
When Array::Heap is not present, this limit is raised to 12.
This limit can be changed using the $Iterator::Merger::Max_generate variable.
Sort::Merge, generally much slower but can be an interesting alternative when a small number of elements is to be merged, as it does not have any generation overhead.
Thomas Drugeon, <tdrugeon@cpan.org>
Copyright (C) 2008-2021 by Thomas Drugeon
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.32 or, at your option, any later version of Perl 5 you may have available.
To install Iterator::Merger, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Iterator::Merger
CPAN shell
perl -MCPAN -e shell install Iterator::Merger
For more information on module installation, please visit the detailed CPAN module installation guide.