Catmandu::MARC::Tutorial - A documentation-only module for new users of Catmandu::MARC
perldoc Catmandu::MARC::Tutorial
The command below converts file data.mrc into JSON:
$ catmandu convert MARC to JSON < data.mrc
$ catmandu convert MARC to MARC --type XML < data.mrc
To read UNIMARC records use the RAW parser to get the correct character encoding.
$ catmandu convert MARC --type RAW to JSON < data.mrc $ catmandu convert MARC --type RAW to MARC --type XML < data.mrc
To extract data from a MARC record on needs a Fix routine. This is a small language to manipulate data. In the example below we extract all 245 fields from MARC:
$ catmandu convert MARC to CSV --fix 'marc_map(245,title); retain(title)' < data.mrc
The Fix marc_map puts the MARC 245 field in the title field. The Fix retain makes sure only the title field ends up in the CSV file.
marc_map
title
retain
The marc_map Fix can get one or more subfields to extract from MARC:
$ catmandu convert MARC to CSV --fix 'marc_map(245ac,title); retain(title)' < data.mrc
In the example below the 650a field can be repeated in some marc records. We will join all the repetitions in an comma delimited list for each record.
First we create a Fix file containing all the Fixes, then we execute the catmandu command.
Open a text editor and create the myfix.fix file with content:
myfix.fix
marc_map(650a,subject.$append) join_field(subject,",") retain(subject)
And execute the command:
$ catmandu convert MARC to CSV --fix myfix.fix < data.mrc
We will create a list of subjects (650a) and count the number of items in this list for each record. The CSV file will contain the _id (record identifier) and subject the number of 650a fields.
_id
subject
marc_map(650a,subject.$append) count(subject) retain(_id, subject)
We will create first a Fix script which selects only the records that contain an ISBN field (022$a). All the isbns found we will print inline using the add_to_exporter Fix.
select
add_to_exporter
marc_map(020a,isbn.$append) select exists(isbn) # Loop over the ISBNs and print them to a CSV exporter do list(path:isbn,var:c) move_field(c,result.isbn) add_to_exporter(result,CSV) end
Execute the following catmandu command, notice that we ignore the normal output with help of the Null exporter (all output will be generated) by the Fix script:
Null
$ catmandu convert MARC to Null --fix myfix.fix < data.mrc
Here we can use the Fix script as in the previous example and use the UNIX "sort -u" command:
$ catmandu convert MARC to Null --fix myfix.fix < data.mrc | sort -u
In the example we need an extra condition for match the content of the 920a field against the string book.
book
marc_map(020a,isbn.$append) marc_map(920a,type) select all_match(type,"book") select exists(isbn) # Loop over the ISBNs and print them to a CSV exporter do list(path:isbn,var:c) move_field(c,result.isbn) add_to_exporter(result,CSV) end
And run the command:
First we need to create a list of keys that need to be matched against our MARC records. In the example below we create a CSV file with a key , value header and all the keys that are OK:
key
value
$ cat mylist.txt key,value book,OK article,OK journal,OK
Next we create a Fix script that maps the MARC 900a field to a field called type. This type field we lookup in the mylist.txt file. If a match is found, then the type field will contain the value in the list (OK). When no match is found then the type will contain the original value. We reject all records that have OK as type and keep only the ones that weren't matched in the file.
type
mylist.txt
marc_map(900a,type) lookup(type,'/tmp/mylist.txt') reject all_match(type,OK) retain(_id,type)
And now run the command:
To process this information we need to create a Fix script like the one below (line numbers are added here to explain the working of this script):
01: marc_map('***',text.$append) 02: 03: filter(text,'(\b\d{4}-?\d{3}[\dxX]\b)') 04: replace_all(text.*,'.*(\b\d{4}-?\d{3}[\dxX]\b).*',$1) 05: 06: do list(path:text) 07: unless is_valid_issn(.) 08: reject() 09: end 10: end 11: 12: vacuum() 13: 14: select exists(text) 15: 16: join_field(text,' ; ') 17: 18: retain(_id,text)
On line 01 all the text in the MARC record is mapped into a text array. On line 03 we filter out this array all the lines that contain an ISSN string using a regular expression. On line 04 the replace_all is used to delete everything in the text array that isn't an ISSN number. On line 06-10 we go over every ISSN string and check if it has a valid checksum and erase it when not. On line 12 we use the vacuum function to remove any remaining empty fields On line 14 we select only the records that contain a valid ISSN number On line 16 the ISSN get joined by a semicolon ';' into a long string On line 18 we keep only the record id and the ISSNs in for the report.
text
replace_all
vacuum
Run this Fix script (without the line number) using this command
For this example we need a Fix script that contains validation rules we need to check. For instance, we require to have a 245 field and at least a 008 control field with a date filled in. This can be coded as in:
# Check if a 245 field is present unless marc_has('245') log("no 245 field",level:ERROR) end # Check if there is more than one 245 field if marc_has_many('245') log("more than one 245 field?",level:ERROR) end # Check if in 008 position 7 to 10 contains a 4 digit number ('\d' means digit) unless marc_match('008/07-10','\d{4}') log("no 4-digit year in 008 position 7 -> 10",level:ERROR) end
Put this Fix script in a file myfix.fix and execute the Catmandu command with the "-D" option for logging and the Null exporter to discard the normal output
$ catmandu -D convert MARC to Null --fix myfix.fix < data.mrc
$ catmandu convert MARC to MARC < data.mrc > output.mrc
$ catmandu convert MARC to MARC --fix 'marc_add("900",a,"checked")' < data.mrc > output.mrc
$ catmandu convert MARC to MARC --fix 'marc_remove("024")' < data.mrc > output.mrc
$ catmandu convert MARC to MARC --fix 'marc_add("650p","test")' < data.mrc > output.mrc
$ catmandu convert MARC to MARC --fix 'marc_map(900a,type); select all_match(type,book)' < data.mrc > output.mrc
To install Catmandu::MARC, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Catmandu::MARC
CPAN shell
perl -MCPAN -e shell install Catmandu::MARC
For more information on module installation, please visit the detailed CPAN module installation guide.