NAME

PICA::Parser - Parse PICA+ data

SYNOPSIS

PICA::Parser->parsefile( $filename_or_handle ,
    Field => \&field_handler,
    Record => \&record_handler
);

PICA::Parser->parsedata( $string_or_function ,
    Field => \&field_handler,
    Record => \&record_handler
);

DESCRIPTION

This module can be used to parse normalized PICA+ and PICA+ XML. The conrete parsers are implemented in PICA::PlainParser and PICA::XMLParser.

CONSTRUCTOR

new (params)

Creates a Parser to store common parameters (see below). These parameters will be used as default when calling parsefile or parsedata. Note that you do not have to use the constructor to use PICA::Parser. These two methods do the same:

my $parser = PICA::Parser->new( %params );
$parser->parsefile( $file );

PICA::Parser->parsefile( $file, %params );

METHODS

parsefile (filename, params)

Parses pica data from a file, specified by a filename or filehandle. The default parser is PICA::PlainParser. If the filename extension is .xml or .xml.gz or the 'Format' parameter set to 'xml' then PICA::XMLParser is used instead.

PICA::Parser->parsefile( "data.picaplus", Field => \&field_handler );
PICA::Parser->parsefile( \*STDIN, Field => \&field_handler, Format='XML' );

Common parameters that are passed to the specific parser are:

Field

Reference to a handler function for parsed PICA+ fields. The function is passed a PICA::Field object and it should return it back to the parser. You can use this function as a simple filter by returning a modified field. If no PICA::Field object is returned then it will be skipped.

Record

Reference to a handler function for parsed PICA+ records. The function is passed a PICA::Record. If the function returns a record then this record will be stored in an array that is passed to EndCollection. You can use this method as a filter by returning a modified record, but for performance reasons it is recommended to directly use the record instead of storing it.

Collection

Alias for EndCollection. Ignored if EndCollection is specified.

StartCollection

Reference to a handler function that is called before a collection of PICA+ record. Each file is treated as a collection so this is called before parsing a file.

EndCollection

Reference to a handler function for parsed PICA+ collections. An array of PICA::Record objects is passed to the function.

Additionally the following parameters are known to most parsers:

Strict

Stop on errors (default is false)

EmptyRecords

Skip empty records so they will not be passed to the record handler (default is false). Empty records easily occur for instance if your field handler does not return anything - this is useful for performance but you should not forget to set the EmptyRecords parameter. In every case empty records are counted with a special counter that can be read with the empty_counter method. The normal counter (method counter) counts all records no matter if empty or not.

parsedata (data, params)

Parses data from a string, array reference, or function. See parsefile and the parsedata method of PICA::PlainParser and PICA::XMLParser for a description of parameters.

By default PICA::PlainParser is used unless there the 'Format' parameter set to 'xml':

PICA::Parser->parsedata( $picastring, Field => \&field_handler );
PICA::Parser->parsedata( \@picalines, Field => \&field_handler );

counter

Get the number of read records so far.

empty_counter

Get the number of empty records that have been read so far. Empty records are counted but not passed to the record handler unless you specify the EmptyRecords parameter. The number of non-empty records is the difference between counter and empty_counter.

TODO

Better logging needs to be added, for instance a status message every n records. This me be implemented with multiple handlers per record (maybe piped). Handling of broken records should also be improved.

AUTHOR

Jakob Voss <jakob.voss@gbv.de>

LICENSE

Copyright (C) 2007 by Verbundzentrale Göttingen (VZG) and Jakob Voss

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.

Please note that these module s not product of or supported by the employers of the various contributors to the code nor by OCLC PICA.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 233:

Non-ASCII character seen before =encoding in 'Göttingen'. Assuming UTF-8