PICA::Path - PICA path expression to match field and subfield values
use PICA::Path; use PICA::Parser::Plain; # extract URLs from PIC Records, given from STDIN my $urlpath = PICA::Path->new('009P$a'); my $parser = PICA::Parser::Plain->new(\*STDIN); while ( my $record = $parser->next ) { print "$_\n" for $urlpath->record_subfields($record); }
PICA path expressions can be used to match fields and subfields of PICA::Data records or equivalent record structures. An instance of PICA::Path is a blessed array reference, consisting of the following fields:
regular expression to match field tags against
regular expression to match occurrences against, or undefined
regular expression to match subfields against
substring start position
substring end position
Example record:
... use PICA::Path; # PICA::Data record my $record = { record => [ [ '005A', '', '0', '1234-5678' ], [ '005A', '', '0', '1011-1213' ], [ '009Q', '', 'u', 'http://example.org/', 'x', 'A', 'z', 'B', 'z', 'C' ], [ '021A', '', 'a', 'Title', 'd', 'Supplement' ], [ '031N', '', 'j', '1600', 'k', '1700', 'j', '1800', 'k', '1900', 'j', '2000' ], [ '045F', '01', 'a', '001' ], [ '045F', '02', 'a', '002' ], [ '045U', '', 'e', '003', 'e', '004' ], [ '045U', '', 'e', '005' ] ], _id => 1234 }; # create path my $path = PICA::Path->new('021A$ab'); # match record my $match = $path->match_record($record); # $match = 'TitleSupplement'
Field 021A has only unique subfield codes.
021A
# get all subfields $path = PICA::Path->new('021A'); $match = $path->match($record); # $match = 'TitleSupplement' # get single subfield by code $path = PICA::Path->new('021A$a'); $match = $path->match($record); # $match = 'Title' # get two subfields by code $path = PICA::Path->new('021A$ad'); $match = $path->match($record); # $match = 'TitleSupplement'); $path = PICA::Path->new('021A$da'); $match = $path->match($record); # $match = 'TitleSupplement' # get two subfields by code in specific order $path = PICA::Path->new('021A$da'); $match = $path->match($record, pluck => 1); # $match = 'SupplementTitle' # join subfields $path = PICA::Path->new('021A$da'); $match = $path->match($record, pluck => 1, join => ' '); # $match = 'Supplement Title'
Option split creates a list out of subfields:
split
# split subfields to list $path = PICA::Path->new('021A$da'); $match = $path->match($record, split => 1); # $match = ['Title', 'Supplement']
Option nested_arrays creates a list for every field found:
nested_arrays
# split fields to lists $path = PICA::Path->new('021A$da'); $match = $path->match($record, split => 1, nested_arrays => 1); # $match = [['Title', 'Supplement']]
Field 009Q has repeated subfields.
009Q
# get all subfields $path = PICA::Path->new('009Q'); $match = $path->match($record); # $match = 'http://example.orgABC' # get repeated subfields $path = PICA::Path->new('009Q$z'); $match = $path->match($record); # $match = 'BC'
# split subfields to list $path = PICA::Path->new('009Q'); $match = $path->match($record, split => 1); # $match = ['http://example.org', 'A', 'B', 'C'] # split subfields to list $path = PICA::Path->new('009Q$z'); $match = $path->match($record, split => 1); # $match = ['B', 'C']
# split fields to lists $path = PICA::Path->new('009Q$z'); $match = $path->match($record, split => 1, nested_arrays => 1); # $match = [['B', 'C']]
Field 005A is repeated.
005A
# get all subfields $path = PICA::Path->new('009Q'); $match = $path->match($record); # $match = '1234-56781011-1213' # get subfields by code $path = PICA::Path->new('009Q'); $match = $path->match($record); # $match = '1234-56781011-1213'
# split subfields to list $path = PICA::Path->new('005A$0'); $match = $path->match($record, split => 1); # $match = ['1234-5678', '1011-1213'] ```
# split fields to lists $path = PICA::Path->new('005A$0'); $match = $path->match($record, split => 1, nested_arrays => 1); # $match = [['1234-5678'], ['1011-1213']]
Field 045U is repeated and has repeated subfields.
045U
# get all subfields $path = PICA::Path->new('045U'); $match = $path->match($record); # $match = '003004005' # get subfields by code $path = PICA::Path->new('045U$e'); $match = $path->match($record); # $match = '003004005'
# split subfields to list $path = PICA::Path->new('045U$e'); $match = $path->match($record, split => 1); # $match = ['003', '004', '005']
# split fields to lists $path = PICA::Path->new('045U$e'); $match = $path->match($record, split => 1, nested_arrays => 1); # $match = [['003', '004'], ['005']]
Field 045F is repeated and has occurrences.
045F
# get subfield from field with specific occurrence $path = PICA::Path->new('045F[01]$a'); $match = $path->match($record); # $match = '001' # get subfield from field with wildcard for occurrence $path = PICA::Path->new('045F[0.]$a'); $match = $path->match($record); # $match = '001002'
# split subfields to list $path = PICA::Path->new('045F[0.]$a'); $match = $path->match($record, split => 1); # $match = ['001', '002']
The dot (.) is a wildcard for field tags, occurrence and subfield codes.
The path ..... means take any subfield from any field.
.....
# get all subfields from all fields $path = PICA::Path->new('....$.'); $match = $path->match($record); # $match = '1234-56781011-1213http://example.org/ABCTitleSupplement16001700180019002000001002003004005' # get specific subfield from all fields $path = PICA::Path->new('....$a'); $match = $path->match($record); # $match = 'Title001002'
# split subfields to list $path = PICA::Path->new('....$a'); $match = $path->match($record, split => 1); # $match = ['Title', '001', '002']
# split fields to lists $path = PICA::Path->new('....'); $match = $path->match($record, split => 1, nested_arrays => 1); # $match = [['1234-5678'], ['1011-1213'], [ 'http://example.org/', 'A', 'B', 'C', ], [ 'Title', 'Supplement' ], [ 1600, 1700, 1800, 1900, 2000, ], ['001'], ['002'], [ '003', '004' ], ['005']]
Create a PICA path by parsing the path expression. The expression consists of
A tag, consisting of three digits, the first 0 to 2, followed by a digit or @. The character . can be used as wildcard.
0
2
@
.
An optional occurrence, given by two or three digits (or . as wildcard) in brackets, e.g. [12], [0.] or [102].
[12]
[0.]
[102]
An optional list of subfields. Allowed subfield codes include _A-Za-z0-9.
_A-Za-z0-9
An optional position, preceded by /. Both single characters (e.g. /0 for the first), and character ranges (such as 2-4, -3, 2-...) are supported.
/
/0
2-4
-3
2-
Returns matched fields as string or array reference.
Optional parameter:
By default all the matched values are joined into a string without a field separator. Use the join function to set the separator. Default: '' (empty string).
my $record = { _id => 123X, record => [[ '021A', '', 'a', 'Title', 'd', 'Supplement' ]] } my $path = PICA::Path->new( '021A' ); my $match = $path->match_record( $record, join => ' - ' ); # $match = 'Title - Supplement'
Be default, all subfields are added to the mapping in the order they are found in the record. Using the pluck option, one can select the required order of subfields to map. Default: 0.
my $record = { _id => 123X, record => [[ '021A', '', 'a', 'Title', 'd', 'Supplement' ]] } my $path = PICA::Path->new( '021A' ); my $match = $path->match_record( $record, pluck => 1 ); # $match = 'SupplementTitle'
When split is set to 1 then all mapped values will be joined into an array instead of a string. Default: 0.
my $record = { _id => 123X, record => [[ '021A', '', 'a', 'Title', 'd', 'Supplement' ]] } my $path = PICA::Path->new( '021A' ); my $match = $path->match_record( $record, split => 1 ); # $match = [ 'Title', 'Supplement' ]
When the split option is specified the output of the mapping will always be an array of strings (one string for each subfield found). Using the nested_array option the output will be an array of array of strings (one array item for each matched field, one array of strings for each matched subfield). Default: 0.
my $record = { _id => 123X, record => [[ '045U', '', 'e', '003', 'e', '004' ], [ '045U', '', 'e', '005' ]] } my $path = PICA::Path->new( '045U' ); my $match = $path->match_record( $record, nested_arrays => 1 ); # $match = [[ '003', '004'], ['005' ]]
Force array as return value. Default: 0.
my $record = { _id => 123X, record => [[ '021A', '', 'a', 'Title', 'd', 'Supplement' ]] } my $path = PICA::Path->new( '021A' ); my $match = $path->match_record( $record, force_array => 1 ); # $match = [ 'TitleSupplement' ]
Check whether a given PICA field matches the field and occurrence of this path. Returns the $field on success.
$field
Returns an array reference with fields of a PICA::Data that match the path. Subfield codes are ignored.
Returns a list of matching subfields (optionally trimmed by from and length) without inspection field and occurrence values.
Stringifies the PICA path to normalized form. Subfields are separated with $, unless called as stringify(1) or the first subfield is $.
$
stringify(1)
Return the stringified field expression or undefined.
Return the stringified subfields expression or undefined.
Return the stringified occurrences expression or undefined.
Return the stringified position or undefined.
Catmandu::Fix::pica_map
To install PICA::Data, copy and paste the appropriate command in to your terminal.
cpanm
cpanm PICA::Data
CPAN shell
perl -MCPAN -e shell install PICA::Data
For more information on module installation, please visit the detailed CPAN module installation guide.