The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

REST::Client::CrossRef - Read data from CrossRef using its REST API

VERSION

Version 0.009

DESCRIPTION

This module use CrossRef REST API to read the data from the CrossRef repository.

SYNOPSIS

use Log::Any::Adapter( 'File', './log.txt', 'log_level'=> 'info');
use REST::Client::CrossRef;

#the mail address is added in the request's header
#return the data without transformation

my $cr = REST::Client::CrossRef->new(
   mailto        => 'you@somewhre.com', 
   spit_raw_data => 1,
);

#cache the data with HTTP::Cache::Transparent
$cr->init_cache(
 {   BasePath => ".\cache",
     NoUpdate => 60 * 60,
     verbose  => 0
 });

my $data =  $cr->journal_from_doi('10.1088/0004-637X/722/2/971');

print Dumper($data), "\n";   #$data is a hash ref of the json data converted to perl

#unfold the data to something like
# field1/subfield1/subfield2 : value 
#add an undef value after each item fields
#output only the fields given with keys_to_keep, with the same ordering

my $cr = REST::Client::CrossRef->new(
      mailto        => 'you@somewhere.com',
      add_end_flag  => 1,
      keys_to_keep => [
          ['author'], ['title'], ['container-title'],
          ['volume'],['issue'], ['page'],['issued/date-parts'], ['published-print/date-parts']
 ],);

 my $data = $cr->article_from_doi('10.1088/0004-637X/722/2/971');

 for my $row (@$data) {
     if (! $row) {
         print "\n";
         next;
      }
      while ( my ($f, $v) = each  %$row) {
         print "$f : $v \n";
     }
 }


 #display the item's fields in alphabetic order
 #add 'end of data' field after each item

 my $cr = REST::Client::CrossRef->new(
     mailto       => 'you@somewhre.com',
     add_end_flag => 1,
     sort_output => 1,
  );

 $cr->init_cache(
 {   BasePath => "C:\\Windows\\Temp\\perl",
     NoUpdate => 60 * 60,
     verbose  => 0
 });

 my @fields = (qw/author title/);
 my @values = (qw/allan electron/);

 #return 100 items by page

 $cr->rows(100);
 my $data = $cr->query_articles( \@fields, \@values );
 while () {
     last unless $data;

     for my $row (@$data) {
         print "\n" unless ($row);
         for my $field (keys %$row) {
             print $field, ": ", $row->{$field}. "\n";
         }
     }
     $data = $cr->get_next();
 }

 Example output:

 author : Wilke, Ingrid;
 MacLeod, Allan M.;
 Gillespie, William A.;
 Berden, Giel;
 Knippels, Guido M. H.;
 van der Meer, Alexander F. G.;
 container-title : Optics and Photonics News
 issue : 12
 issued/date-parts : 2002, 12, 1, 
 page : 16
 published-online/date-parts : 2002, 12, 1, 
 published-print/date-parts : 2002, 12, 1, 
 title : Detectors: Time-Domain Terahertz Science Improves Relativistic Electron-Beam Diagnostics
 volume : 13

 my $cr = REST::Client::CrossRef->new(
     mailto        => 'dokpe@unifr.ch',
     spit_raw_data => 0,
     add_end_flag  => 1,
     json_path     => [
         ['$.author[*]'],
         ['$.title'], 
         ['$.container-title'],
         ['$.volume'], ['$.issue'], ['$.page'], 
         ['$.issued..date-parts'],
         ['$.published-print..date-parts']
     ],
     json_path_callback => { '$.items[*].author[*]' => \&unfold_authors },
 );
 
 sub unfold_authors {
     my ($data_ar) = @_;
     my @res;
     for my $aut (@$data_ar) {
         my $line;
         if ( $aut->{affiliation} ) {
             my @aff;
             for my $hr ( @{$aut->{affiliation}} ) {
                 my @aff = values %$hr;
                 $aff[0] =~ s/\r/ /g;
                 $line .= " " . $aff[0];
             }
         }
         my $fn = (defined $aut->{given}) ?( ", " . $aut->{given} . "; " ): "; "; 
         push @res,  $aut->{family} . $fn . ($line // "");
     }
     return \@res;
 }

  my $data = $cr->article_from_doi($doi);
  next unless $data;
 for my $row (@$data) {
     if ( !$row ) {
         print "\n";
         next;
     }
     while ( my ( $f, $v ) = each %$row ) {
         print "$f : $v \n";
     }
 }

 Example of output:
 $.author[*] : Pelloni, Michelle;  University of Basel, Department of Chemistry, Mattenstrasse 24a, BPR 1096, CH 4002 Basel, Switzerland
 Cote, Paul;  School of Chemistry and Biochemistry, University of Geneva, Quai Ernest Ansermet 30, CH-1211 Geneva, Switzerland
 ....
 Warding, Tom.;  University of Basel, Department of Chemistry, Mattenstrasse 24a, BPR 1096, CH 4002 Basel, Switzerland
 $.title : Chimeric Artifact for Artificial Metalloenzymes
 $.container-title : ACS Catalysis
 $.volume : 8
 $.issue : 2
 $.page : 14-18
 $.issued..date-parts : 2018, 1, 24
 $.published-print..date-parts : 2018, 2, 2
   
  my $cr = REST::Client::CrossRef->new( mailto => 'you@somewher.com'
    ,keys_to_keep => [["breakdowns/id", "id"], ["location"], [ "primary-name", "breakdowns/primary-name", "name" ]],
   ); 

 $cr->init_cache(
     {   BasePath => "C:\\Windows\\Temp\\perl",
         NoUpdate => 60 * 60,
         verbose  => 0
     });

 $cr->rows(100);

 my $rs_ar = $cr->get_members;

 while () {
     last unless $rs_ar;
     for my $row_hr (@$rs_ar) {
          for my $k (keys  %$row_hr) {
                print $k . " : " . $row_hr->{$k} . "\n";
          }
      } 
      $rs_ar = $cr->get_next();
  }

 Example of items in the output above

 id : 5007
 location : W. Struve 1 Tartu 50091 Estonia
 primary-name : University of Tartu Press

 id : 310
 location : 23 Millig Street Helensburgh Helensburgh Argyll G84 9LD United Kingdom
 primary-name : Westburn Publishers

 id : 183
 location : 9650 Rockville Pike Attn: Lynn Willis Bethesda MD 20814 United States
 primary-name : Society for Leukocyte Biology

$cr = REST::Client::CrossRef->new( ... mailto => your@email.here, ...)

The email address is placed in the header of the page. See https://github.com/CrossRef/rest-api-doc#good-manners--more-reliable-service

$cr = REST::Client::CrossRef->new( ... sort_output =>1, ...)

Rows can be sorted using the key name with sort_ouput => 1. Default to 0. In effect only if spit_raw_data is false.

$cr = REST::Client::CrossRef->new( ... spit_raw_data =>1, ...)

Display the data as a hashref if 0 or as an array ref of hasref, where each hashref is a row of key => value that can be sorted with sort_ouput => 1. spit_raw_data default to 0.

$cr = REST::Client::CrossRef->new( ... add_end_flag =>1, ...)

Add undef after an item's fields. Default to 1.

$cr = REST::Client::CrossRef->new( ... keys_to_keep => [[key1, key1a, ...], [key2], ... ], ...)

An array ref of array ref, the inner array ref give a key name and the possible alternative keys for the same value, for example [ "primary-name", "breakdowns/primary-name", "name" ] in the member road (url ending with /members). The keys enumeration starts below message, or message - items if the result is a list. This filters the values that are returned and preserves the ordering of the array ref given in argument. The ouput is an array ref of hash ref, each hash having the key and the values. Values are flattened as string. In effect only if spit_raw_data is false.

$cr = REST::Client::CrossRef->new( ... json_path => [[$path1, path1a, ...], [path2], ... ], ...)

An array ref of array ref, the inner array refs give a JSONPath and the possible alternative path for the same value. See also JSON::Path. The json path starts below message, or message - items if the result is a list. The output, ordering, filtering and flattening is as above. In effect only if spit_raw_data is false.

$cr = REST::Client::CrossRef->new( ... json_path_callback => {$path => \&some_function }

An hash ref that associates a JSON path and a function that will be run on the data return by $jpath->values($json_data). The function must accept an array ref as first argument and must return an array ref.

$cr = REST::Client::CrossRef->new( ... json_path_safe => "0", ... )

To turn off the message non-safe evaluation, died at... set this to 0. Default to 1.

$cr = REST::Client::CrossRef->new( ... version => "v1", ... )

To use a defined version of the api. See https://github.com/CrossRef/rest-api-doc#api-versioning

$cr->init_cache( @args ) $cr->init_cache( $hash_ref )

See HTTP::Cache::Transparent. The array of args is passed to the object constructor. The log file shows if the data has been fetch from the cache and if the server has been queryied to detect any change.

$cr->rows( $row_value )

Set the rows parameter that determines how many items are returned in one page

$cr->works_from_doi( $doi, $filter, $select )

Retrive the metadata from the work road (url ending with works) using the article's doi. Return undef if the doi is not found. You may pass a $filter hash ref {filter1 => value1, ...} see for a list of filters. You may pass a $select string with the format "field1,field2,..." to return only these fields. Fields that may be use for selection are (October 2018): abstract, URL, member, posted, score, created, degree, update-policy, short-title, license, ISSN, container-title, issued, update-to, issue, prefix, approved, indexed, article-number, clinical-trial-number, accepted, author, group-title, DOI, is-referenced-by-count, updated-by, event, chair, standards-body, original-title, funder, translator, archive, published-print, alternative-id, subject, subtitle, published-online, publisher-location, content-domain, reference, title, link, type, publisher, volume, references-count, ISBN, issn-type, assertion, deposited, page, content-created, short-container-title, relation, editor. Use keys_to_keep or json_path to define an ordering in the ouptut. Use select to filter the fields to be returned from the server.

$cr->works_from_orcid( $orcid, $filter, $select )

Retrive the metadata or undef from the work road using author's orcid. $filter and $select as above.

$cr->journal_from_doi( $doi )

A shortcut for works_from_doi( $doi, undef, "container-title,page,issued,volume,issue")

$cr->article_from_doi( $doi )

A shortcut for works_from_doi( $doi, undef, "title,container-title,page,issued,volume,issue,author,published-print,published-online")

$cr->article_from_funder( $funder_id, {name=>'smith'}, $select )

Retrive the metadata from the works road for a given funder, searched with an author's name or filtered by any valid filter name. For example {'has-orcid'=> 'true', 'has-affiliation'=>'true'}. $select default to "title,container-title,page,issued,volume,issue,published-print,DOI". Use * to retrieve all fields.

$cr->get_types()

Retrieve all the metadata from the types road.

$cr->get_members()

Retrieve all the metadata (> 10'000 items) from the members road.

$cr->member_from_id( $member_id )

Retrieve a members from it's ID

$cr->get_journals()

Retrieve all the metadata (> 60'000 items) from the journals road.

$cr->get_licences()

Retrieve all the metadata (> 700 items) from the licenses road.

$cr->query_works( $fields_array_ref, $values_array_ref, $select_string )

See Field Queries for the fields that can be searched. You may omit the "query." part in the field name. The corresponding values are passed in a second array, in the same order. Beware that searching with first and family name is treated as an OR not and AND: query_works([qw(name name)], [qw(Tom Smith)], $select) will retrieve all the works where and author has Tom in the name field or all works where an author has Smith in the name field. See works_from_doi above for the fields that can be selected. Use keys_to_keep or json_path to define an ordering in the ouptut. Use select to filter the fields to be returned from the server.

$cr->query_articles( $fields_array_ref, $values_array_ref )

A shortcut for $cr->query_works($fields_array_ref, $values_array_ref, "title,container-title,page,issued,volume,issue,author,published-print,published-online")

$cr->query_journals( $fields_array_ref, $values_array_ref )

A shortcut for $cr->query_works($fields_array_ref, $values_array_ref, "container-title,page,issued,volume,issue"

$cr->get_next()

Return the next set of data in the /works, /members, /journals, /funders, /licences roads, Return undef after the last set.

$cr->agencies_from_dois( $dois_array_ref )

Retrieve the Registration agency (CrossRef, mEdra ...) using an array ref of article doi. See

$cr->funders_from_location( $a_location_name )

Retrieve the funder from a country. Problem is that there is no way of having a list of country name used. These locations has been succefully tested: United Kingdom, Germany, Japan, Morocco, Switzerland, France.

INSTALLATION

To install this module type the following: perl Makefile.PL make make test make install

On windows use nmake or dmake instead of make.

DEPENDENCIES

The following modules are required in order to use this one

Moo => 2,
JSON => 2.90,
URI::Escape => 3.31,
REST::Client => 273,
Log::Any => 1.049,
HTTP::Cache::Transparent => 1.4,
Carp => 1.40,
JSON::Path => 0.420

BUGS

See below.

SUPPORT

Any questions or problems can be posted to me (rappazf) on my gmail account.

The current state of the source can be extract using Mercurial from http://sourceforge.net/projects/rest-client-crossref/

AUTHOR

F. Rappaz
CPAN ID: RAPPAZF 

COPYRIGHT

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

The full text of the license can be found in the LICENSE file included with this module.

SEE ALSO

Catmandu::Importer::CrossRef Catmandu is a toolframe, *nix oriented.

Bib::CrossRef Import data from CrossRef using the CrossRef search, not the REST Api, and convert the XML result into something simpler.