The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

WARC::Index - base class for WARC index classes

SYNOPSIS

  use WARC::Index::File::CDX;   # or ...
  use WARC::Index::File::SDBM;
  # or some other WARC::Index::File::* implementation

  $index = attach WARC::Index::File::CDX (...); # or ...
  $index = attach WARC::Index::File::SDBM (...);

  $record = $index->search(url => $url, time => $when);
  @records = $index->search(url => $url, time => $when);

  build WARC::Index::File::CDX (...);   # or ...
  build WARC::Index::File::SDBM (...);

DESCRIPTION

WARC::Index is an abstract base class for indexes on WARC files and WARC-alike files. This class establishes the expected interface and provides a simple interface for building indexes.

Methods

$index = attach WARC::Index::File::* (...)

Construct an index object using the indicated technology and whatever parameters the index implementation needs.

Typically, indexes are file-based and a single parameter is the name of an index file which in turn contains the names of the indexed WARC files.

$yes_or_no = $index->searchable( $key )

Return true or false to reflect if the index can search for the requested key. Indexes may be able to search for keys that are not present in entries returned from those indexes.

See the "Search Keys" section of the WARC::Collection page for details on the implemented search keys.

$record = $index->search( ... )
@records = $index->search( ... )

Search an index for records matching parameters. The WARC::Collection class uses this method to search each index in a collection.

If the none of the requested search keys are searchable, returns an undefined value in scalar context and the empty list in list context.

The details of the parameters for this method are documented in the "Search Keys" section of the WARC::Collection page.

build WARC::Index::File::* (into => $dest, from => ...)
build WARC::Index::File::* (from => [...], into => $dest)

The WARC::Index base class does provide this method, however. The build method works by loading the corresponding index builder class and driving the process or simply returning the newly-constructed object.

The build method itself handles the from key for specifying the files to index. The from key can be given an array reference, after which more key => value pairs may follow, or can simply use the rest of the argument list as its value.

If the from key is given, the build method will read the indicated files, construct an index, and return nothing. If the from key is not given, the build method will construct and return an index builder.

All index builders accept at least the into key for specifying where to store the index. See the documentation for WARC::Index::File::*::Builder for more information.

Optional Methods

Some index systems may also provide these methods:

$entry = $index->first_entry

An index that has a sequential ordering may provide this method to obtain the first entry in the index. Indexes that do not have a meaningful sequence amongst their entries do not provide this method.

$entry = $index->entry_at( $position )

An index that has a sequential ordering may provide this method to obtain an entry at a specified position in the index. The exact format of the position parameter is not specified in general, but should be a value previously obtained from the entry_position method on an entry from the same index. Valid positions may be sparse.

Index system registration

The WARC::Index package also provides a registry of loaded index support. The register function adds the calling package to the list.

WARC::Index::register( filename => $filename_re )

Add the calling package to an internal list of available index handlers. The calling package must be a subclass of WARC::Index or this function will croak().

The filename key indicates that the calling package expects to handle index files with names matching the provided regex.

WARC::Index::find_handler( $filename )

Return the registered handler for $filename or undef if none match. If multiple handlers match, which one is returned is unspecified.

AUTHOR

Jacob Bachmeyer, <jcb@cpan.org>

SEE ALSO

WARC, WARC::Index::Entry

COPYRIGHT AND LICENSE

Copyright (C) 2019, 2020 by Jacob Bachmeyer

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.