WARC::Collection - Interface to a group of WARC files
use WARC::Collection; $collection = assemble WARC::Collection ($index_1, $index_2, ...); $collection = assemble WARC::Collection from => ($index_1, ...); $yes_or_no = $collection->searchable( $key ); $record = $collection->search(url => $url, time => $when); @records = $collection->search(url => $url, time => $when);
The WARC::Collection class is the primary means by which user code is expected to use the WARC library. This class uses indexes to efficiently search for records in one or more WARC files.
WARC::Collection
The search method accepts a list of parameters as key => value pairs with each pair narrowing the search, sorting the results, or both, indicated in the following list with "[N ]", "[ S]", or "[NS]", respectively.
search
[N ]
[ S]
[NS]
Supplying an array reference as a value indicates a search where any of the values in the array are acceptable. This does not affect sorting.
The same search keys documented here are used for searching indexes, since WARC::Collection is a wrapper around one or more indexes, but index support modules do not sort their results. Only WARC::Collection sorts the returned entries, so keys listed below as "sort-only" are ignored by the index support modules.
The keys supported are:
An exact match for a URL.
A prefix match for a URL. Prefers records with shorter URLs.
Prefer records collected nearer to the requested time.
An exact match for a (presumably unique) WARC-Record-ID.
Exact match for continuation records for a WARC-Record-ID that identifies a logical record stored using WARC record segmentation. Searching on this key returns only the continuation records.
Assemble a collection of WARC files from one index or multiple indexes, specified either as objects derived from WARC::Index or filenames.
WARC::Index
While multiple indexes can be used in a collection, note that searching a collection requires individually searching every index in the collection.
Return true or false to reflect if any index in the collection can search for the requested key.
Search the indexes for records matching the parameters and return the best match in scalar context or a list of all matches in list context. The returned values are WARC::Record objects.
WARC::Record
See "Search Keys" for more information about the parameters.
Jacob Bachmeyer, <jcb@cpan.org>
WARC
Copyright (C) 2019 by Jacob Bachmeyer
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
To install WARC, copy and paste the appropriate command in to your terminal.
cpanm
cpanm WARC
CPAN shell
perl -MCPAN -e shell install WARC
For more information on module installation, please visit the detailed CPAN module installation guide.