The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

REST::Client::CrossRef - Read data from CrossRef using its REST API

VERSION

Version 0.004

DESCRIPTION

This module use CrossRef REST API to read the data from the CrossRef repository.

SYNOPSIS

   use Log::Any::Adapter( 'File', './log.txt', 'log_level'=> 'info');
   use REST::Client::CrossRef;

   #the mail address is added in the request's header
   #return the data without transformation

   my $cr = REST::Client::CrossRef->new(
      mailto        => 'you@somewhre.com', 
      spit_raw_data => 1,
   );

   #cache the data with HTTP::Cache::Transparent
   $cr->init_cache(
    {   BasePath => ".\cache",
        NoUpdate => 60 * 60,
        verbose  => 0
    });

   my $data =  $cr->journal_from_doi('10.1088/0004-637X/722/2/971');

   print Dumper($data), "\n";   #$data is a hash ref of the json data converted to perl

   #unfold the data to something like
   # field1/subfield1/subfield2 : value 
   #add an undef value after each item fields
   #output only the fields given with keys_to_keep, with the same ordering

   my $cr = REST::Client::CrossRef->new(
         mailto        => 'you@somewhere.com',
         add_end_flag  => 1,
         keys_to_keep => [
             ['author'], ['title'], ['container-title'],
             ['volume'],['issue'], ['page'],['issued/date-parts'], ['published-print/date-parts']
    ],);

    my $data = $cr->article_from_doi('10.1088/0004-637X/722/2/971');

    for my $row (@$data) {
        if (! $row) {
            print "\n";
            next;
         }
         while ( my ($f, $v) = each  %$row) {
            print "$f : $v \n";
        }
    }


    #display the item's fields in alphabetic order
    #add 'end of data' field after each item

    my $cr = REST::Client::CrossRef->new(
        mailto       => 'you@somewhre.com',
        add_end_flag => 1,
        sort_output => 1,
     );

    $cr->init_cache(
    {   BasePath => "C:\\Windows\\Temp\\perl",
        NoUpdate => 60 * 60,
        verbose  => 0
    });

    my @fields = (qw/author title/);
    my @values = (qw/allan electron/);

    #return 100 items by page

    $cr->rows(100);
    my $data = $cr->query_articles( \@fields, \@values );
    while () {
        last unless $data;

        for my $row (@$data) {
            for my $field (keys %$row) {
                print $field, ": ", $row->{$field}. "\n";
            }
        }
        $data = $cr->get_next();
    }

    Example output:

    author : Wilke, Ingrid;
    MacLeod, Allan M.;
    Gillespie, William A.;
    Berden, Giel;
    Knippels, Guido M. H.;
    van der Meer, Alexander F. G.;
    container-title : Optics and Photonics News
    issue : 12
    issued/date-parts : 2002, 12, 1, 
    page : 16
    published-online/date-parts : 2002, 12, 1, 
    published-print/date-parts : 2002, 12, 1, 
    title : Detectors: Time-Domain Terahertz Science Improves Relativistic Electron-Beam Diagnostics
    volume : 13
    end of data :  

    my $cr = REST::Client::CrossRef->new(
        mailto        => 'dokpe@unifr.ch',
        spit_raw_data => 0,
        add_end_flag  => 1,
        json_path     => [
            ['$.items[*].author[*]'],
            ['$.items[*].title'], 
            ['$.items[*].container-title'],
            ['$.items[*].volume'], ['$.items[*].issue'], ['$.items[*].page'], 
            ['$.items[*].issued..date-parts'],
            ['$.items[*].published-print..date-parts']
        ],
        json_path_callback => { '$.items[*].author[*]' => \&unfold_authors },
    
    );
    
    sub unfold_authors {
        my ($data_ar) = @_;
        my @res;
        for my $aut (@$data_ar) {
            my $line;
            if ( $aut->{affiliation} ) {
                my @aff;
                for my $hr ( @{$aut->{affiliation}} ) {
                    my @aff = values %$hr;
                    $aff[0] =~ s/\r/ /g;
                    $line .= " " . $aff[0];
                }
            }
            my $fn = (defined $aut->{given}) ?( ", " . $aut->{given} . "; " ): "; "; 
            push @res,  $aut->{family} . $fn . ($line // "");
    
        }
        return \@res;
    }

     my $data = $cr->article_from_doi($doi);
     next unless $data;
    for my $row (@$data) {
        if ( !$row ) {
            print "\n";
            next;
        }
        while ( my ( $f, $v ) = each %$row ) {
            print "$f : $v \n";
        }
    }

    Example of output:
    $.items[*].author[*] : Pelloni, Michelle;  University of Basel, Department of Chemistry, Mattenstrasse 24a, BPR 1096, CH 4002 Basel, Switzerland
    Cote, Paul;  School of Chemistry and Biochemistry, University of Geneva, Quai Ernest Ansermet 30, CH-1211 Geneva, Switzerland
    ....
    Warding, Tom.;  University of Basel, Department of Chemistry, Mattenstrasse 24a, BPR 1096, CH 4002 Basel, Switzerland
    $.items[*].title : Chimeric Artifact for Artificial Metalloenzymes
    $.items[*].container-title : ACS Catalysis
    $.items[*].volume : 8
    $.items[*].issue : 2
    $.items[*].page : 14-18
    $.items[*].issued..date-parts : 2018, 1, 24
    $.items[*].published-print..date-parts : 2018, 2, 2
      

     my $cr = REST::Client::CrossRef->new( mailto => 'you@somewher.com'
       ,keys_to_keep => [["breakdowns/id", "id"], ["location"], [ "primary-name", "breakdowns/primary-name", "name" ]],
      ); 

    $cr->init_cache(
        {   BasePath => "C:\\Windows\\Temp\\perl",
            NoUpdate => 60 * 60,
            verbose  => 0
        });

    $cr->rows(100);

    my $rs_ar = $cr->get_members;

    while () {
        last unless $rs_ar;
        for my $row_hr (@$rs_ar) {
             for my $k (keys  %$row_hr) {
                   print $k . " : " . $row_hr->{$k} . "\n";
             }
         } 
         $rs_ar = $cr->get_next();
     }

    Example of items in the output above

    id : 5007
    location : W. Struve 1 Tartu 50091 Estonia
    primary-name : University of Tartu Press

    id : 310
    location : 23 Millig Street Helensburgh Helensburgh Argyll G84 9LD United Kingdom
    primary-name : Westburn Publishers

    id : 183
    location : 9650 Rockville Pike Attn: Lynn Willis Bethesda MD 20814 United States
    primary-name : Society for Leukocyte Biology

$cr = REST::Client::CrossRef->new( ... mailto => your@email.here, ...)

The email address is placed in the header of the page. See https://github.com/CrossRef/rest-api-doc#good-manners--more-reliable-service

$cr = REST::Client::CrossRef->new( ... sort_output =>1, ...)

Rows can sorted using the key name with sort_ouput => 1. Default to 0. In effect only if spit_raw_data is false.

$cr = REST::Client::CrossRef->new( ... spit_raw_data =>1, ...)

Display the data as a hashref if 0 or as an array ref of hasref, where each hashref is a row of key => value. Default to 0.

$cr = REST::Client::CrossRef->new( ... add_end_flag =>1, ...)

Add an 'end of data' key at the end of an item field. Add undef after an item field when keys_to_keep is defined. Default to 1. In effect only if spit_raw_data is false.

$cr = REST::Client::CrossRef->new( ... keys_to_keep => [[key1, key1a, ...], [key2], ... ], ...)

An array ref of array ref, the inner array refs give a key name and the possible alternative keys for the same value, for example [ "primary-name", "breakdowns/primary-name", "name" ] in the member road (url ending with /members). The keys enumeration starts below message, or message - items if the result is a list. This filters the values that are returned and preserves the ordering of the array ref given in argument. The ouput is an array ref of hash ref, each hash having the key and the values. Values are flattened as string. In effect only if spit_raw_data is false.

$cr = REST::Client::CrossRef->new( ... json_path => [[$path1, path1a, ...], [path2], ... ], ...)

An array ref of array ref, the inner array refs give a JSONPath and the possible alternative path for the same value. See also JSON::Path. The output, ordering, filtering and flattening is as above. In effect only if spit_raw_data is false. The path starts below the message key in the JSON data.

$cr = REST::Client::CrossRef->new( ... json_path_callback => {$path => \&some_function }

An hash ref that associates a JSON path and a function that will be run on the data return by $jpath->values($json_data). The function must accept an array ref as first argument and must return an array ref.

$cr = REST::Client::CrossRef->new( ... version => "v1", ... )

To use a defined version of the api. See https://github.com/CrossRef/rest-api-doc#api-versioning

$cr->init_cache( @args ) $cr->init_cache( $hash_ref )

See HTTP::Cache::Transparent. The array of args is passed to the object constructor. The log file shows if the data has been fetch from the cache and if the server has been queryied to detect any change.

$cr->rows( $row_value )

Set the rows parameter that determines how many items are returned in one page

$cr->works_from_doi( $doi )

Retrive the metadata from the work road (url ending with works) using the article's doi. Return undef if the doi is not found. You may pass a select string with the format "field1,field2,..." to return only these fields. Fields that may be use for selection are (October 2018): abstract, URL, member, posted, score, created, degree, update-policy, short-title, license, ISSN, container-title, issued, update-to, issue, prefix, approved, indexed, article-number, clinical-trial-number, accepted, author, group-title, DOI, is-referenced-by-count, updated-by, event, chair, standards-body, original-title, funder, translator, archive, published-print, alternative-id, subject, subtitle, published-online, publisher-location, content-domain, reference, title, link, type, publisher, volume, references-count, ISBN, issn-type, assertion, deposited, page, content-created, short-container-title, relation, editor. Use keys_to_keep or json_path to define an ordering in the ouptut. Use select to filter the fields to be returned from the server.

$cr->journal_from_doi( $doi )

A shortcut for works_from_doi( $doi, "container-title,page,issued,volume,issue")

$cr->article_from_doi( $doi )

A shortcut for works_from_doi( $doi, "title,container-title,page,issued,volume,issue,author,published-print,published-online")

$cr->article_from_funder( $funder_id, {name=>'smith'}, $select )

Retrive the metadata from the works road for a given funder, searched with an author's name or orcid. $select default to "title,container-title,page,issued,volume,issue,published-print,DOI". Use * to retrieve all fields.

$cr->get_types()

Retrieve all the metadata from the types road.

$cr->get_members()

Retrieve all the metadata (> 10'000 items) from the members road.

$cr->member_from_id( $member_id )

Retrieve a members from it's ID

$cr->get_journals()

Retrieve all the metadata (> 60'000 items) from the journals road.

$cr->get_licences()

Retrieve all the metadata (> 700 items) from the licenses road.

$cr->query_works( $fields_array_ref, $values_array_ref, $select_string )

See Field Queries for the fields that can be searched. You may omit the "query." part in the field name. The corresponding values are passed in a second array, in the same order. Beware that searching with first and family name is treated as an OR not and AND: query_works([qw(name name)], [qw(Tom Smith)], $select) will retrieve all the works where and author has Tom in the name field or all works where an author has Smith in the name field. See works_from_doi above for the fields that can be selected. Use keys_to_keep or json_path to define an ordering in the ouptut. Use select to filter the fields to be returned from the server.

$cr->query_articles( $fields_array_ref, $values_array_ref )

A shortcut for $cr->query_works($fields_array_ref, $values_array_ref, "title,container-title,page,issued,volume,issue,author,published-print,published-online")

$cr->query_journals( $fields_array_ref, $values_array_ref )

A shortcut for $cr->query_works($fields_array_ref, $values_array_ref, "container-title,page,issued,volume,issue"

$cr->get_next()

Return the next set of data in the /works, /members, /journals, /funders, /licences roads, Return undef after the last set.

$cr->agencies_from_dois( $dois_array_ref )

Retrieve the Registration agency (CrossRef, mEdra ...) using an array ref of article doi. See

$cr->funders_from_location( $a_location_name )

Retrieve the funder from a country. Problem is that there is no way of having a list of country name used. These locations has been succefully tested: United Kingdom, Germany, Japan, Morocco, Switzerland, France.

                    # return { $key, "" } unless ( $res_hr->{$key} );
                    if ( defined $val ) {
                        $res_hr->{$last_key} .= $val . ", ";
                    }
                    else {
                        $res_hr->{$last_key} = "";
                    }
                }

INSTALLATION

To install this module type the following: perl Makefile.PL make make test make install

On windows use nmake or dmake instead of make.

DEPENDENCIES

The following modules are required in order to use this one

     Moo => 2,
     JSON => 2.90,
     URI::Escape => 3.31,
     REST::Client => 273,
     Log::Any => 1.049,
     HTTP::Cache::Transparent => 1.4,
     Carp => 1.40,
     JSON::Path => 0.420

BUGS

See below.

SUPPORT

Any questions or problems can be posted to me (rappazf) on my gmail account.

The current state of the source can be extract using Mercurial from http://sourceforge.net/projects/rest-client-crossref/

AUTHOR

    F. Rappaz
    CPAN ID: RAPPAZF 

COPYRIGHT

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

The full text of the license can be found in the LICENSE file included with this module.

SEE ALSO

Catmandu::Importer::CrossRef Catmandu is a toolframe, *nix oriented.

Bib::CrossRef Import data from CrossRef using the CrossRef search, not the REST Api, and convert the XML result into something simpler.