The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

PICA::Path - PICA path expression to match field and subfield values

SYNOPSIS

    use PICA::Path;
    use PICA::Parser::Plain;

    # extract URLs from PIC Records, given from STDIN
    my $urlpath = PICA::Path->new('009P$a');
    my $parser = PICA::Parser::Plain->new(\*STDIN);
    while ( my $record = $parser->next ) {
        print "$_\n" for $urlpath->record_subfields($record);
    }

DESCRIPTION

PICA path expressions can be used to match fields and subfields of PICA::Data records or equivalent record structures. An instance of PICA::Path is a blessed array reference, consisting of the following fields:

  • regular expression to match field tags against

  • regular expression to match occurrences against, or range of occurrences values given as array reference (from-to), or undefined

  • regular expression to match subfields against

  • substring start position

  • substring end position

Matching rules

Example record:

    use PICA::Data;
    use PICA::Path;
    
    # PICA::Data record
    my $record = PICA::Data->new(<<'PP');
    005A $01234-5678
    005A $01011-1213
    009Q $uhttp://example.org/$xA$zB$zC
    021A $aTitle$dSupplement
    031N $j1600$k1700$j1800$k1900$j2000
    045F/01 $a001
    045F/02 $a002
    045U $e003$e004
    045U $e005
    PP

    # create path
    my $path = PICA::Path->new('021A$ad');
    
    # match record
    my $match = $path->match($record);
    # $match = 'TitleSupplement'

Match single field with no subfield repetition

Field 021A has only unique subfield codes.

    # get all subfields
    $path = PICA::Path->new('021A');
    $match = $path->match($record);
    # $match = 'TitleSupplement'
    
    # get single subfield by code
    $path = PICA::Path->new('021A$a');
    $match = $path->match($record);
    # $match = 'Title'
    
    # get two subfields by code
    $path = PICA::Path->new('021A$ad');
    $match = $path->match($record);
    # $match = 'TitleSupplement');
    
    $path = PICA::Path->new('021A$da');
    $match = $path->match($record);
    # $match = 'TitleSupplement'
    
    # get two subfields by code in specific order
    $path = PICA::Path->new('021A$da');
    $match = $path->match($record, pluck => 1);
    # $match = 'SupplementTitle'
    
    # join subfields
    $path = PICA::Path->new('021A$da');
    $match = $path->match($record, pluck => 1, join => ' ');
    # $match = 'Supplement Title'

Option split creates a list out of subfields:

    # split subfields to list
    $path = PICA::Path->new('021A$da');
    $match = $path->match($record, split => 1);
    # $match = ['Title', 'Supplement']

Option nested_arrays creates a list for every field found:

    # split fields to lists
    $path = PICA::Path->new('021A$da');
    $match = $path->match($record, split => 1, nested_arrays => 1);
    # $match = [['Title', 'Supplement']]

Match single field with subfield repetition

Field 009Q has repeated subfields.

    # get all subfields
    $path = PICA::Path->new('009Q');
    $match = $path->match($record);
    # $match = 'http://example.orgABC'
    
    # get repeated subfields
    $path = PICA::Path->new('009Q$z');
    $match = $path->match($record);
    # $match = 'BC'

Option split creates a list out of subfields:

    # split subfields to list
    $path = PICA::Path->new('009Q');
    $match = $path->match($record, split => 1);
    # $match = ['http://example.org', 'A', 'B', 'C']
    
    # split subfields to list
    $path = PICA::Path->new('009Q$z');
    $match = $path->match($record, split => 1);
    # $match = ['B', 'C']

Option nested_arrays creates a list for every field found:

    # split fields to lists
    $path = PICA::Path->new('009Q$z');
    $match = $path->match($record, split => 1, nested_arrays => 1);
    # $match = [['B', 'C']]

Match repeated Field with no subfield repetition

Field 005A is repeated.

    # get all subfields
    $path = PICA::Path->new('009Q');
    $match = $path->match($record);
    # $match = '1234-56781011-1213'
    
    # get subfields by code
    $path = PICA::Path->new('009Q');
    $match = $path->match($record);
    # $match = '1234-56781011-1213'

Option split creates a list out of subfields:

    # split subfields to list
    $path = PICA::Path->new('005A$0');
    $match = $path->match($record, split => 1);
    # $match = ['1234-5678', '1011-1213']
    ```

Option nested_arrays creates a list for every field found:

    # split fields to lists
    $path = PICA::Path->new('005A$0');
    $match = $path->match($record, split => 1, nested_arrays => 1);
    # $match = [['1234-5678'], ['1011-1213']]

Match repeated field with subfield repetition

Field 045U is repeated and has repeated subfields.

    # get all subfields
    $path = PICA::Path->new('045U');
    $match = $path->match($record);
    # $match = '003004005'
    
    # get subfields by code
    $path = PICA::Path->new('045U$e');
    $match = $path->match($record);
    # $match = '003004005'

Option split creates a list out of subfields:

    # split subfields to list
    $path = PICA::Path->new('045U$e');
    $match = $path->match($record, split => 1);
    # $match = ['003', '004', '005']

Option nested_arrays creates a list for every field found:

    # split fields to lists
    $path = PICA::Path->new('045U$e');
    $match = $path->match($record, split => 1, nested_arrays => 1);
    # $match = [['003', '004'], ['005']]

Match repeated field with occurrence

Field 045F is repeated and has occurrences.

    # get subfield from field with specific occurrence
    $path = PICA::Path->new('045F/01$a');
    $match = $path->match($record);
    # $match = '001'
    
    # get subfield from field with wildcard for occurrence
    $path = PICA::Path->new('045F/0.$a');
    $match = $path->match($record);
    # $match = '001002'

Option split creates a list out of subfields:

    # split subfields to list
    $path = PICA::Path->new('045F/0.$a');
    $match = $path->match($record, split => 1);
    # $match = ['001', '002']

Match the whole record with wildcards

The dot (.) is a wildcard for field tags, occurrence and subfield codes.

The path ..... means take any subfield from any field.

    # get all subfields from all fields
    $path = PICA::Path->new('..../*$.');
    $match = $path->match($record);
    # $match = '1234-56781011-1213http://example.org/ABCTitleSupplement16001700180019002000001002003004005'
    
    # get specific subfield from all fields
    $path = PICA::Path->new('..../*$a');
    $match = $path->match($record);
    # $match = 'Title001002'

Option split creates a list out of subfields:

    # split subfields to list
    $path = PICA::Path->new('..../*$a');
    $match = $path->match($record, split => 1);
    # $match = ['Title', '001', '002']

Option nested_arrays creates a list for every field found:

    # split fields to lists
    $path = PICA::Path->new('..../*');
    $match = $path->match($record, split => 1, nested_arrays => 1);
    # $match = [['1234-5678'], ['1011-1213'], [ 'http://example.org/', 'A', 'B', 'C', ], [ 'Title', 'Supplement' ], [ 1600, 1700, 1800, 1900, 2000, ], ['001'], ['002'], [ '003', '004' ], ['005']]

METHODS

new( $expression )

Create a PICA path by parsing the path expression. The expression consists of

  • A tag, consisting of three digits, the first 0 to 2, followed by a digit or @. The character . can be used as wildcard.

  • An optional occurrence, given by two or three digits (or . as wildcard) in brackets, e.g. [12], [0.] or [102] or following a slash (e.g. /12, /0....). Use a star for any occurrence (/*). The star can be omitted if the first character of the tag is 2 or ..

  • An optional list of subfields. Allowed subfield codes include A-Za-z0-9.

  • An optional position, preceded by /. Both single characters (e.g. /0 for the first), and character ranges (such as 2-4, -3, 2-...) are supported.

match( $record, %options )

Alias for match_record.

match_record( $record, %options )

Returns matched fields as string or array reference.

Optional parameter:

join STRING

By default all the matched values are joined into a string without a field separator. Use the join function to set the separator. Default: '' (empty string).

    my $record = { _id => 123X, record => [[ '021A', '', 'a', 'Title', 'd', 'Supplement' ]] }
    my $path = PICA::Path->new( '021A' );
    my $match = $path->match_record( $record, join => ' - ' );
    # $match = 'Title - Supplement'
 
pluck 0|1

Be default, all subfields are added to the mapping in the order they are found in the record. Using the pluck option, one can select the required order of subfields to map. Default: 0.

    my $record = { _id => 123X, record => [[ '021A', '', 'a', 'Title', 'd', 'Supplement' ]] }
    my $path = PICA::Path->new( '021A' );
    my $match = $path->match_record( $record, pluck => 1 );
    # $match = 'SupplementTitle'
split 0|1

When split is set to 1 then all mapped values will be joined into an array instead of a string. Default: 0.

    my $record = { _id => 123X, record => [[ '021A', '', 'a', 'Title', 'd', 'Supplement' ]] }
    my $path = PICA::Path->new( '021A' );
    my $match = $path->match_record( $record, split => 1 );
    # $match = [ 'Title', 'Supplement' ]
nested_arrays 0|1

When the split option is specified the output of the mapping will always be an array of strings (one string for each subfield found). Using the nested_array option the output will be an array of array of strings (one array item for each matched field, one array of strings for each matched subfield). Default: 0.

    my $record = { _id => 123X, record => [[ '045U', '', 'e', '003', 'e', '004' ], [ '045U', '', 'e', '005' ]] }
    my $path = PICA::Path->new( '045U' );
    my $match = $path->match_record( $record, nested_arrays => 1 );
    # $match = [[ '003', '004'], ['005' ]]
force_array 0|1

Force array as return value. Default: 0.

    my $record = { _id => 123X, record => [[ '021A', '', 'a', 'Title', 'd', 'Supplement' ]] }
    my $path = PICA::Path->new( '021A' );
    my $match = $path->match_record( $record, force_array => 1 );
    # $match = [ 'TitleSupplement' ]

match_field( $field )

Check whether a given PICA field matches the field and occurrence of this path. Returns the $field on success.

match_subfields( $field )

Returns a list of matching subfields (optionally trimmed by from and length) without inspection field and occurrence values.

stringify( [ $short ] )

Stringifies the PICA path to normalized form. Subfields are separated with $, unless called as stringify(1) or the first subfield is $.

fields

Return the stringified field expression or undefined.

subfields

Return the stringified subfields expression or undefined.

occurrences

Return the stringified occurrences expression or undefined.

positions

Return the stringified position or undefined.

FUNCTIONS

pica_field_matcher( $path [, $path...] )

Return a function that tells whether a field matches any of given PICA Path expressions:

  my $matcher = pica_field_matcher("012X","012Y");
  if ($matcher->($field)) { ... }

Subfields and positions in PICA Path expressions are ignored.

SEE ALSO

Catmandu::Fix::pica_map