The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

List::MergeSorted::XS - Fast merger of presorted lists

SYNOPSIS

  use List::MergeSorted::XS 'merge';

  # merge plain integer lists
  @lists = ([1, 3, 5], [2, 6, 8], [4, 7, 9]);
  merge(\@lists); # [1..9]

  # return only beginning of union
  merge(\@lists, limit => 4); # [1..4]

  # remove duplicates
  @lists = ([1, 2], [0, 2, 3], [3, 4]);
  merge(\@lists, uniq_cb => sub { $_[0] }); # [0..4]

  # merge complicated objects based on accompanying integer keys
  @lists = ([
              [1 => 'x'], [3 => {t => 1}]
            ],
            [
              [2 => bless {}, 'C'], [4 => 5]
            ]);
  $sorted = merge(\@lists, key => sub { $_[0][0] });

DESCRIPTION

This module takes a set of presorted lists and returns the sorted union of those lists.

To maximize speed, an appropriate algorithm is chosen heuristically based on the size of the input data; additionally, efficient C implementations of the algorithms are used.

FUNCTIONS

$merged = merge(\@list_of_lists, %opts)

Computes the sorted union of a set of lists of integers. The first parameter must be an array reference which itself contains a number of array references. The result set is returned in an array reference.

The constituent lists must meet these preconditions for correct behavior:

  • either each element of each list is an integer or an integer may be computed from the element using the keyer parameter below

  • each list is pre-sorted in ascending order

merge's behavior may be modified by additional options passed after the list:

  • limit

    Specifies a maximum number of elements to return. By default all elements are returned.

  • key_cb

    Specifies a callback routine which will be passed an element of an inner list through @_. The routine must return the integer value by which the element will be sorted. In effect, the default callback is sub {$_[0]}. This allows more complicated structures to be merged.

  • uniq_cb

    Specifies a callback routine which will be passed an element of an inner list through @_. The routine must return the integer value which identifies the element in some sense. Elements with the same identity value will not be duplicated in the output. Elements with the same identity must also have the same key.

    If no uniq_cb is passed, duplicates are allowed in the output.

NOTES

Only ascending order is supported. To merge lists which sorted in descending order, use key_cb = sub { -$_[0] }>.

EXPORT

None by default, merge at request.

ALGORITHM

The algorithm used to merge the lists is heurisitcally chosen based on the number of lists (N), the total number of elements in the lists (M), and the requested limit (L). (The heuristic constants were determined by analysis of benchmarks on a 2.5GHz Intel Xeon where all data fit in memory.)

When there are many lists and the element limit is a significant fraction of the total element count (L/M > 1/4), perl's built-in sort is used to order the concatenated lists. The time complexity is O(M log M). Since this method always processes the full list it cannot short-circuit in the highly-limited case, as do the priority queue methods.

When L is a smaller fraction of M, a priority queue is used to track the list heads. For small N, this is implemented as a linked list kept in sorted order (using linear-time insertion), yielding time complexity of O(L N). For large N, a Fibonacci heap is used, for a time complexity of O(L log N). The linked list has less bookkeeping overhead than the heap, so it is more efficient for fewer lists.

To force a particular implementation, set the package variable $MERGE_METHOD to one of these constants:

  • List::MergeSorted::XS::SORT

  • List::MergeSorted::XS::PRIO_LINEAR

  • List::MergeSorted::XS::PRIO_FIB

TODO

* Support comparitive orderings, where no mapping from elements to integers exists but a well-defined ordering exists for which a comparison callback callback can be provided.

* Allow modification of the heuristics based on local benchmarks.

AUTHOR

Adam Thomason, <athomason@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2010 by Say Media Inc <cpan@saymedia.com>

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.6 or, at your option, any later version of Perl 5 you may have available.