The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

Tutorial for MARC::MIR DSL

Warnings

for the moment, everything is found in MARC::MIR namespace, this will change. Also t/* is empty (this is clearly the next step) the scripts that uses MARC::MIR are yet working.

What is MARC::MIR

I dealt with lot of MARC records in the past (mainly from/to iso2709 files) and was really annoyed by the existing libraries. A MARC record is a very simple structure. every library i saw missed this point by wrapping records into painfull OO approach, this make the MARC manipulation anoying and slow. Perl is awesome for manipulate datastructures: i wanted those power and simplicity back!

Simple datastructure

A MIR record is an array containing a leader and the MIR field_collection

    [ $leader, [@fields] ]

A MIR field_collection is a collection of either data field or control field.

A MIR control field is a tag and a value.

    [ '001', '1231313145' ]

A MIR data field is a tag, a MIR subfield_collection and an optionnal MIR indicator. The MIR indicator is a 2 char string or a 2 elements array. so all those MIR field_collection are valid.

    [ $tag, [@subfield] ]
    [ $tag, [@subfield], "  " ]
    [ $tag, [@subfield], [' ',' '] ]

a MIR subfield_collection is a list of pairs tag/value.

This is an example of a complete MIR record:

    [ "Header" => 
        [ [ '001' => '2344564564' ] # this is the ID
        ,   [ '856'
            , [ [ q => "jpeg" ]
              , [ z => "cover from original version" ] 
              , [ u => "http://localhost/img/" ] 
              ]
            ]
        ]
    ]

the DSL

to make things more readable and less error prone, we also add a DSL. Every keywords of this DSL works the same way. FIXME : explain.

also, iso2709_records_of is an helper that stream the records of an ISO2709 formatted file.

some examples

the perfect boilerplate

    use autodie;
    use Modern::Perl;
    use Perlude;
    use MARC::MIR;

print all the ids of the records (assuming the id is in 001, the common case)

    now    { say record_id from_iso2709 } iso2709_records_of "biblio.marc";

or

    marawk { say $ID } "biblio.marc";

remove every 9.. fields

    now {
        $_ = from_iso2709;
        with_fields { @$_ = grep { (tag) !~ /^9/ } @$_ };
        print to_iso2709;
    } iso2709_records_of "biblio.marc";

every 856$q must be jpeg

    now {
        $_ = from_iso2709;
        map_fields {
            tag eq '856' and map_subfields {
                (tag) eq 'z' and with_value { $_ = 'jpeg' }
            }
        }
        with_fields { @$_ = grep_fields { (tag) !~ /^9/ } @$_ };
    } iso2709_records_of "biblio.marc";

or

    marawk { map_values { $_ = 'jpeg' } [qw< 856 z >] } "biblio.marc"

collect every 856$z by id

    use Modern::Perl;
    use YAML;
    use MARC::MIR;

    my %seen;
    marawk {
        map_values { push @{ $seen{$ID} }, $_ } [qw< 856 z >]
    } "data/*.RAW";
    say YAML::Dump \%seen;

marawk