Patrick Hochstenbach
and 6 contributors


Catmandu::Importer::OAI - Package that imports OAI-PMH feeds


    # From the command line
    $ catmandu convert OAI --url

    $ catmandu convert OAI --url --metadataPrefix didl --handler raw

    # In perl
    use Catmandu::Importer::OAI;

    my $importer = Catmandu::Importer::OAI->new(
                    url => "...",
                    metadataPrefix => "..." ,
                    from => "..." ,
                    until => "..." ,
                    set => "...",
                    handler => "..." );

    my $n = $importer->each(sub {
        my $hashref = $_[0];
        # ...





Metadata prefix to specify the metadata format. Set to oai_dc by default.

handler( sub {} | $object | 'NAME' | '+NAME' )

Handler to transform each record from XML DOM (XML::LibXML::Element) into Perl hash.

Handlers can be provided as function reference, an instance of a Perl package that implements 'parse', or by a package NAME. Package names should be prepended by + or prefixed with Catmandu::Importer::OAI::Parser. E.g foobar will create a Catmandu::Importer::OAI::Parser::foobar instance.

By default the handler Catmandu::Importer::OAI::Parser::oai_dc is used for metadataPrefix oai_dc, Catmandu::Importer::OAI::Parser::marcxml for marcxml, Catmandu::Importer::OAI::Parser::mods for mods, and Catmandu::Importer::OAI::Parser::struct for other formats. In addition there is Catmandu::Importer::OAI::Parser::raw to return the XML as it is.


An optional set for selective harvesting.


An optional datetime value (YYYY-MM-DD or YYYY-MM-DDThh:mm:ssZ) as lower bound for datestamp-based selective harvesting.


An optional datetime value (YYYY-MM-DD or YYYY-MM-DDThh:mm:ssZ) as upper bound for datestamp-based selective harvesting.


Harvest identifiers instead of full records.


An optional resumptionToken to start harvesting from.


Don't do any HTTP requests but return URLs that data would be queried from.


Preprocess XML records with XSLT script(s) given as comma separated list or array reference. Requires Catmandu::XML.


When an oai request fails, the importer will retry this number of times. Set to '0' by default.

Internally the exponential backoff algorithm is used for this. This means that after every failed request the importer sleeps for 2^collision seconds. The maximum ammount of time after which the importer stops can be calculated with:

 max_retries = 1 -> max = 2 seconds
 max_retries = 2 -> max = 2 + 4 = 6 seconds
 max_retries = 3 -> max = 2 + 4 + 8 = 14 seconds
 max_retries = n -> max = 2^(n + 1) - 2 seconds


If you are connected to the internet via a proxy server you need to set the coordinates to this proxy in your environment:

    export http_proxy="http://localhost:8080"

If you are connecting to a HTTPS server and don't want to verify the validity of certificates of the peer you can set the PERL_LWP_SSL_VERIFY_HOSTNAME to false in your environment. This maybe required to connect to broken SSL servers:



Every Catmandu::Importer is a Catmandu::Iterable all its methods are inherited. The Catmandu::Importer::OAI methods are not idempotent: OAI-PMH feeds can only be read once.


In addition to methods inherited from Catmandu::Iterable, this module provides the following public methods:

handle_record( $dom )

Process an XML DOM as with xslt and handler as configured and return the result.