-
-
04 Sep 2008 02:50:35 UTC
- Distribution: Sort-External
- Module version: 0.18
- Source (raw)
- Browse (raw)
- Changes
- How to Contribute
- Issues (1)
- Testers (1273 / 61 / 0)
- Kwalitee
Bus factor: 0- 95.75% Coverage
- License: unknown
- Activity
24 month- Tools
- Download (53.08KB)
- MetaCPAN Explorer
- Permissions
- Subscribe to distribution
- Permalinks
- This version
- Latest version
and 1 contributors- Marvin Humphrey <marvin at rectangular dot com>
- Dependencies
- File::Temp
- Test::More
- and possibly others
- Reverse dependencies
- CPAN Testers List
- Dependency graph
NAME
Sort::External - Sort huge lists.
SYNOPSIS
my $sortex = Sort::External->new( mem_threshold => 1024**2 * 16 ); while (<HUGEFILE>) { $sortex->feed($_); } $sortex->finish; while ( defined( $_ = $sortex->fetch ) ) { do_stuff_with($_); }
DESCRIPTION
Problem: You have a list which is too big to sort in-memory.
Solution: "feed, finish, and fetch" with Sort::External, the closest thing to a drop-in replacement for Perl's sort() function when dealing with unmanageably large lists.
How it works
Cache sortable items in memory. Periodically sort the cache and flush it to disk, creating a sorted "run". Complete the sort by sorting the input cache and any existing runs into an output stream.
Note that if Sort::External hasn't yet flushed the cache to disk when finish() is called, the whole operation completes in-memory.
In the CompSci world, "internal sorting" refers to sorting data in RAM, while "external sorting" refers to sorting data which is stored on disk, tape, punchcards, or any storage medium except RAM -- hence, this module's name.
Stringification
Items fed to Sort::External will be returned in stringified form (assuming that the cache gets flushed at least once):
$foo = "$foo"
. Since this is unlikely to be desirable when objects or deep data structures are involved, Sort::External throws an error if you feed it anything other than simple scalars.Expert note: Sort::External does a little extra bookkeeping to sustain each item's taint and UTF-8 flags through the journey to disk and back.
METHODS
new()
my $sortscheme = sub { $Sort::External::b <=> $Sort::External::a }; my $sortex = Sort::External->new( mem_threshold => 1024**2 * 16, # default: 1024**2 * 8 (8 MiB) cache_size => 100_000, # default: undef (disabled) sortsub => $sortscheme, # default sort: standard lexical working_dir => $temp_directory, # default: see below );
Construct a Sort::External object.
mem_threshold - Allow the input cache to consume approximately
mem_threshold
bytes before sorting it and flushing to disk. Experience suggests that the optimum setting is somewhere in the range of 1-16 MiB.cache_size - Specify a hard limit for the input cache in terms of sortable items. If set, overrides
mem_threshold
.sortsub -- A sorting subroutine. Be advised that you MUST use $Sort::External::a and $Sort::External::b instead of $a and $b in your sub. Before deploying a sortsub, consider using a GRT instead, as described in the Sort::External::Cookbook -- it's probably a lot faster.
working_dir - The directory where the temporary sortfile will reside. By default, the location of the sortfile is determined by the behavior of File::Temp's constructor.
feed()
$sortex->feed(@items);
Feed one or more sortable items to your Sort::External object. It is normal for occasional pauses to occur during feeding as caches are flushed.
finish()
# if you intend to call fetch... $sortex->finish; # otherwise.... use Fcntl; $sortex->finish( outfile => 'sorted.txt', flags => ( O_CREAT | O_WRONLY ), );
Prepare to output items in sorted order.
If you specify the parameter
outfile
, Sort::External will attempt to write your sorted list to that location. By default, Sort::External will refuse to overwrite an existing file; if you want to override that behavior, you can pass Fcntl flags to finish() using the optionalflags
parameter.Note that you can either finish() to an outfile, or finish() then fetch()... but not both.
fetch()
while ( defined( $_ = $sortex->fetch ) ) { do_stuff_with($_); }
Fetch the next sorted item.
BUGS
Please report any bugs or feature requests to
bug-sort-external@rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Sort-External.SEE ALSO
File::Sort, File::MergeSort, and Sort::Merge as possible alternatives.
AUTHOR
Marvin Humphrey <marvin at rectangular dot com> http://www.rectangular.com
COPYRIGHT AND LICENSE
Copyright 2005-2008 Marvin Humphrey. All rights reserved. This module is free software. It may be used, redistributed and/or modified under the same terms as Perl itself.
Module Install Instructions
To install Sort::External, copy and paste the appropriate command in to your terminal.
cpanm Sort::External
perl -MCPAN -e shell install Sort::External
For more information on module installation, please visit the detailed CPAN module installation guide.