The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

List::MergeSorted::XS - Fast merger of presorted lists

SYNOPSIS

  use List::MergeSorted::XS 'merge';

  # merge plain integer lists
  @lists = ([1, 3, 5], [2, 6, 8], [4, 7, 9]);
  merge(\@lists); # [1..9]

  # return only beginning of union
  merge(\@lists, limit => 4); # [1..4]

  # remove duplicates
  @lists = ([1, 2], [0, 2, 3], [3, 4]);
  merge(\@lists, uniq_cb => sub { $_[0] }); # [0..4]

  # merge complicated objects based on accompanying integer keys
  @lists = ([
              [1 => 'x'], [3 => {t => 1}]
            ],
            [
              [2 => bless {}, 'C'], [4 => 5]
            ]);
  $sorted = merge(\@lists, key_cb => sub { $_[0][0] });

DESCRIPTION

This module takes a set of presorted lists and returns the sorted union of those lists.

To maximize speed, an appropriate algorithm is chosen heuristically based on the size of the input data; additionally, efficient C implementations of the algorithms are used.

FUNCTIONS

$merged = merge(\@list_of_lists, %opts)

Computes the sorted union of a set of lists of integers. The first parameter must be an array reference which itself contains a number of array references. The result set is returned in an array reference.

The constituent lists must meet these preconditions for correct behavior:

  • either each element of each list is an integer or an integer may be computed from the element using the key_cb parameter below

  • each list is pre-sorted in ascending order

merge's behavior may be modified by additional options passed after the list:

  • limit

    Specifies a maximum number of elements to return. By default all elements are returned.

  • key_cb

    Specifies a callback routine which will be passed an element of an inner list through @_. The routine must return the integer value by which the element will be sorted. In effect, the default callback is sub {$_[0]}. This allows more complicated structures to be merged.

  • uniq_cb

    Specifies a callback routine which will be passed an element of an inner list through @_. The routine must return the integer value which identifies the element in some sense. Elements with the same identity value will not be duplicated in the output. Elements with the same identity must also have the same key.

    If no uniq_cb is passed, duplicates are allowed in the output.

  • method

    Specifies the algorithm to use to merge the lists. If provided, the value must be one of the constants listed below under ALGORITHM.

    If no method is given, one is chosen automatically based upon the input data. This is generally recommended.

NOTES

Only ascending order is supported. To merge lists which are sorted in descending order, use key_cb => sub { -$_[0] }.

EXPORTS

None by default, merge at request.

ALGORITHM

The algorithm used to merge the lists is heuristically chosen based on the number of lists (N), the total number of elements in the lists (M), and the requested limit (L). (The heuristic constants were determined by analysis of benchmarks on a 2.5GHz Intel Xeon where all data fit in memory.)

When there are many lists and the element limit is a significant fraction of the total element count (L/M > 1/4), perl's built-in sort is used to order the concatenated lists. The time complexity is O(M log M). Since this method always processes the full list, it cannot short-circuit in the highly-limited case (as the priority queue methods do).

When L is a smaller fraction of M, a priority queue is used to track the list heads. For small N, this is implemented as a linked list kept in sorted order (using linear-time insertion), yielding time complexity of O(L N). For large N, a Fibonacci heap is used, for a time complexity of O(L log N). The linked list has less bookkeeping overhead than the heap, so it is more efficient for fewer lists.

To force a particular implementation, pass the method parameter to merge with one of these constants:

  • List::MergeSorted::XS::SORT

  • List::MergeSorted::XS::PRIO_LINEAR

  • List::MergeSorted::XS::PRIO_FIB

TODO

* Support comparative orderings, where no mapping from elements to integers exists but a well-defined ordering exists for which a two-element comparison callback can be provided.

* Allow modification of the heuristics (perhaps based on local benchmarks).

SEE ALSO

John-Mark Gurney's Fibonacci heap library fib

AUTHOR

Adam Thomason, <athomason@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2011 by Say Media Inc <cpan@saymedia.com>

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.9 or, at your option, any later version of Perl 5 you may have available.