The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Net::Z3950::IndexMARC - Comprehensive but inefficent index for MARC records

SYNOPSIS

 $file = MARC::File::USMARC->in($filename);
 $index = new Net::Z3950::IndexMARC();
 while ($marc = $file->next()) {
     $index->add($marc);
 }
 $index->dump(\*STDOUT);
 $hashref = $index->find('@attr 1=4 dinosaur');
 foreach $i (keys %$hashref) {
    $rec = $index->fetch($i);
    print $rec->as_formatted();
 }

DESCRIPTION

This module provides a comprehensive inverted index across a set of MARC records, allowing simple keyword retrieval down to the level of individual field and subfields. However, it does this by building a big Perl data-structure (hash of hashes of arrays) in memory, and makes no efforts whatsoever towards optimisation. So this is only appropriate for small collections of records.

METHODS

new()

 $index = new Net::Z3950::IndexMARC();

Creates a new IndexMARC object. Takes no parameters, and returns the new object.

add()

 $record = new MARC::Record();
 $record->append_fields(...);
 $index->add($record);

Adds a single MARC record to the specified index. A reference to the record itself is also added, so the record object will not be garbage collected until (at least) the index goes out of scope. The record passed in must be of the type MARC::Record.

dump()

 $index->dump(\*STDOUT);

Dumps the contents of the specified index to the specified stream in human-readable form. Takes no arguments. Should only be used for debugging.

find()

 $hithash = $index->find("@and fruit fish");

Finds records satisfying the specified PQF query, and returns a reference to a hash consisting of one element for each matching record.

Each key in the returned hash is a record number, and the corresponding values contains details of the hits in that record. The record number is an integer counting the records in the order in which they were added to the index, starting at zero. It can subsequently be used to retrieve the record itself.

The hit details consist of an array of arbitrary length, one element per occurrence in the record of the searched-for term. Each element of this array is itself an array of three elements: the tag of the field in which the term exist [0], the tag of the subfield [2], and the word-number within the field, starting from word 1 [3].

PQF is Prefix Query Format, as described in the ``Tools'' section of the YAZ manual; however, this module does not perform field-specific searching since to do so would necessarily involve a mapping between Type-1 query access points and MARC fields, which we want to avoid having to assume anything about. Accordingly, all attributes are ignored. Further, at present boolean operations are also ignores, and only the last term in the query is used as a single lookup point.

fetch()

 $marc = $index->fetch($recordNumber);

Returns the MARC::Record object corresponding to the specified record number, as returned from find().

PROVENANCE

This module is part of the Net::Z3950::RadioMARC distribution. The copyright, authorship and licence are all as for the distribution.