NAME
Archive::Heritrix - Perl extension for processing Heritrix archive (.arc) files
SYNOPSIS
use
Archive::Heritrix;
my
$arc
;
#open a single .arc.gz archive
$arc
= Archive::Heritrix->new(
file
=>
'a.arc.gz'
);
while
(
my
$rec
=
$arc
->next_record() ) {
#it's a HTTP::Response object
}
#open a directory of .arc.gz archives. matches recursively on file extension
$arc
= Archive::Heritrix->new(
directory
=>
'eg'
);
while
(
my
$rec
=
$arc
->next_record() ) {
#it's a HTTP::Response object
}
DESCRIPTION
Process Heritrix archive (arc) files as a stream of HTTP::Response objects.
Heritrix is the archival-grade crawler used by the Internet Archive.
SEE ALSO
Heritrix project homepage, http://crawler.archive.org
AUTHOR
Allen Day, <allenday@ucla.edu>
COPYRIGHT AND LICENSE
Copyright (C) 2008 by Allen Day
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.5 or, at your option, any later version of Perl 5 you may have available.