The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

WebService::ReutersConnect - Use the ReutersConnect Live News API

VERSION

Version 0.05

INSTALLATION

Debian based

This module depends only on debian distributed packages. If you're using a debian based system, do

 $ sudo apt-get install perl-modules libtest-fatal-perl perl-base libdbd-sqlite3-perl libdbix-class-perl libdatetime-perl \
 libdatetime-format-iso8601-perl libdevel-repl-perl libfile-sharedir-perl libwww-perl liblog-log4perl-perl libmoose-perl \
 libterm-readkey-perl liburi-perl libxml-libxml-perl

 $ sudo cpan -i WebService::ReutersConnect ## or anything you like.

Other OSs

Use your favorite Perl package installation method.

 $ sudo cpan -i WebService::ReutersConnect ## Should do the job on *NIX systems

SYNOPSIS

This module allows access to Reuters Connect APIs as described here:

http://reutersconnect.com/

It is based on the REST APIs.

You WILL have to contact reuters to get yourself some API credentials if you want to use this module. This is out of scope of this distribution. However, some demo credentials are supplied by this module for your convenience.

By the way, those demo credential change from time to time, so have a look at http://reutersconnect.com/docs/Demo_Login_Page if you get authentication errors.

For your convenience, this module will try scraping the demo credentials from this page if you don't feel like looking at it yourself :)

Shell

This module provides a 'reutersconnect' shell so you can interactively play with the API.

Example:

 $ reutersconnect
 2013/03/20 16:45:05 Will try to use the demo account. Use '/usr/local/bin/reutersconnect -u <username>' to login as a specific user
 2013/03/20 16:45:05 No username/password given. Trying to scrape the demo ones
 2013/03/20 16:45:07 Found 'demo.user/vYkLo4Lv' credentials
 2013/03/20 16:45:08 Granted access to 6 channels
 2013/03/20 16:45:08 Starting shell. ReutersConnect object is '$rc'
                                                                                                                                                                            demo.user@reutersconnect.com> map{ $_->alias().' '.$_->description()."\n" } $rc->channels()
 FES376 US Online Report Top News
 QTZ240 NVO
 STK567 Reuters World Service
 mkc191 Unique-Product-For-User-26440
 txb889 Unique-Product-For-Account-26439
 xHO143 Italy Picture Service

 demo.user@reutersconnect.com> [CTRL-D] to quit

See the rest of this module doc and WebService::ReutersConnect::Channel and WebService::ReutersConnect::Item for a detailed API description.

Perl

Example:

   use WebService::ReutersConnect qw/:demo/;

   my $reuters = WebService::ReutersConnect->new({ username => REUTERS_DEMOUSER,
                                                   password => REUTERS_DEMOPASSWORD });

   my @channels = $reuters->channels();
   my @items    = $reuters->items( $channels[0] );
   my $full_xml_doc = $reuters->fetch_item_xdoc({ item => $items[0] });

Additionally, a very basic demo page scraping mechanism is provided, so you can build an API object without any credential at all if you feel lucky:

   my $reuters = WebService::ReutersConnect->new();
   my @channels = $reuters->channels();

EXAMPLES

Here are some example of usage to get you started quickly:

Fetch the last news about britain from all your channels

  my $res = $reuters->search({ q => 'headline:britain' ,
                               sort => 'date'
                             });
  say("Size: ".$res->size());
  say("Num Found: ".$res->num_found());
  say("Start: ".$res->start());
  foreach my $item ( @{ $res->items() } ){
    say($item->headline());
  }

Fetch the last 5 pictures accross all your channels

  my @items = $reuters->search({ limit => 5 , media_types => [ 'P' ] });
  foreach my $item ( @items ){
    print "\n".$item->date_created().' : '.$item->headline()."\n\n";
    print " CLICK: ".$item->preview_url()."\n\n";
  }

Get the freshest version of the rich NewsML-G2 document about a news item:

  my $xdoc = $reuters->item({  guid => $item->guid() , channel => $item->channel_alias() });
  say $xdoc->asString(); ## That will help you :)

  my ($body_node) = $xc->findnodes('//x:html/x:body'); ## Find the HTML content (in case of article).
  say $body_node->toString(1); ## Print the whole html.

  ## You can also print only the content of the body:
  my @body_parts = $xdoc->get_html_body();
  map { say $_->toString(1) } @body_parts;

  ## Find the subjects:
  my @subjects = $xdoc->get_subjects();
  foreach my $subject ( @subjects ){
    say "This is about: ".$subject->name_main();
  }

AUTHENTICATION

If you supply a ReutersConnect username and password, this module will fetch an authentication token from the service and use it in all subsequent requests.

The basic usage involves giving some classical username and password as demonstrated in the synopsys section.

You can access the authentication token: $this->authToken() for diagnostic and external storage.

You can also build an instance of this module using an authentication token that you stored somewhere:

  my $reuters = WebService::ReutersConnect->new( { authToken => $authToken } );

Beware that ReutersConnect authentication tokens are only valid for 24 hours. It is advised to effectively renew the authentication token more often to avoid any expiration issue. For instance every 12 hours.

This module does NOT contain any mecanism to renew authentication tokens at regular intervals. If you keep long standing instances of this module, it's your responsability to renew them regularly enough.

However, for very simple cases, where there's no concurrent access to the token storage, or when you have only one longstanding instance, the options refresh_token and after_refresh_token can be useful.

DEMO AUTHENTICATION

Reuters provides a demo account so you can try out this API without holding an account with them. The demo credentials live on this page http://reutersconnect.com/docs/Demo_Login_Page

They do change every month, but this module provides a very basic method to scrape them if no username/password is given in the constructor. See SYNOPSIS section.

LOGGING & DEBUGGING

This module uses Log::Log4perl and is automatically initialized to the ERROR level. Feel free to initialize Log::Log4perl to your taste in your application.

Additionally, there's is the debug option that will output very verbose (HTTP traffic) at the INFO level.

ATTRIBUTES

Most attributes are read only and have a default value. Set them at construction time if necessary.

entry_point

Get/Set the ReutersConnect entry URL. Default should work.

login_entry_point

Get/Set the ReutersConnect login entry URL. Default should work.

username

ReutersConnect API username.

password

ReutersConnect API password.

head2 authToken

ReutersConnect authentication token. If not set, this will try to get a new one using the username/password

refresh_token

Option. When true, the module will attempt ONCE fetching a fresh authentication token. from ReutersConnect in case the token held is invalid or expired.

Of course, turning that on only makes sense if you give the username and password at instanciation time.

If you want to be notified of the new token in your client code, you can register a callback:

after_refresh_token

This is a callback called after this module has fetched a new authentication token from ReutersConnect. It's normally used in combination with refresh_token.

Usage:

   my $reuters = WebService::ReutersConnect->new({ username => ...,
                                                   password => ....,
                                                   on_refresh_token => sub{
                                                      my ($new_token) = @_;
                                                      ## Store new token somewhere
                                                   }
                                                 });

user_agent

A LWP::UserAgent. There's a default instance but feel free to replace it with your application one.

debug

Swicthes on/off extra debugging (Specially HTTP Requests and Responses).

date_created

DateTime At which this instance was created.

METHODS

scrape_demo_credentials

Quick and very dirty method to scrape demo credentials from http://reutersconnect.com/docs/Demo_Login_Page This is used automatically when no credential at all are provided in the constructor. You shouldn't have to use that yourself. Returns 1 for success, 0 for failure.

Usage:

  unless( $this->scrape_demo_credentials() ){
     ## Woopsy
  }

channels

Alias for fetch_channels

items

Alias for fetch_items

packages

Alias for fetch_packages

Alias for fetch_search

olr

Alias for fetch_olr

item

Alias for fetch_item_xdoc.

fetch_channels

Fetch the WebService::ReutersConnect::Channel's according to the given options (or not).

Usage:

  my @all_channels = $this->fetch_channels();
  my ( $channel ) = $this->fetch_channels({ channel => [ '56HD' ] });

  ## Filter on channel alias(s)
  my @specific_channels = $this->fetch_channels({ channel => [ '567', '7654' ,... ] });

  ## Filter on channel Category(s) ID(s)
  my @channels = $this->fetch_channels({ channelCategory => [ 'JDJD' , 'JDJD' ] });

fetch_items

Fetch WebService::ReutersConnect::Item news item from Reuters Connect. This is the core method. You MUST specify ONE channel (Get the list using the fetch_channels method). You can give indiferently a channel or a channel alias.

This method returns REAL TIME items.

Usage:

   my @items = $this->fetch_items($channel->alias, { %options  });

Options:

  media_types: An Array of media types to compose from the following options: T (text), P (pictures), V (video), G (graphics) and C (composite)

  date_from: YYYY-MM-DD or DateTime object. Defaults to now - 24h. This is INCLUSIVE
  date_to:   idem but cannot be specified without date_from. Defaults to now. Not that this date is NOT INCLUSIVE

  limit: Number of items to fetch. Default to $this->default_limit()
  sort:  Sort by 'date' (newest first) or by 'score' (more relevant first).

Search for WebService::ReutersConnect::Item's in all Reuters news (from the channels you have access to).

Items found through this method can suffer from a slight delay compared to the live 'items' method.

Options:

 q: Free Text Style query string. See search method in http://reutersconnect.com/files/Reuters_Connect_Web_Services_Developer_Guide.pdf
    for an extended specification

 channels : An Array Ref of restriction Channels (Or channel Aliases)
 categories : An Array Ref of restriction Categories (Or catecogy IDs)
 media_types: An Array of media types to compose from the following options: T (text), P (pictures), V (video), G (graphics) and C (composite)

 limit: Number of items to fetch. Default to $this->default_limit()
 sort:  Sort by 'date' (newest first) or by 'score' (more relevant first).

Usage:

  my @items = $this->fetch_search();

  ## Only videos
  my @items = $this->fetch_search({ media_types => [ 'V' ] });

  ## Only pictures or videos about Britney Spears
  my @items = $this->fetch_search({ q => 'britney spears' , media_types => [ 'P' , 'V'  ] });


  ## Additionally, if you want a L<WebService::ReutersConnect::ResultSet>, use the scalar version of this method:
  my $res = $this->fetch_search({ media_types => [ 'V' ] });
  print $res->num_found().' results in total';
  print $res->size().' results effectively returned (because of limit)';
  print $res->start().' offset in the total result space';
  my @items = @{$res->items()};

fetch_olr

Fetches OnLine Reports (SNI, NEPs, SNEPs, .. ) from the Channels you have access to, You can optionally filter by channel(s).

Options:

  channels: An array ref of channel restriction.

fetch_packages

Fetches the edited NEPs (News Event Package) from a specific Reuters Channel. NEPs comes as WebService::ReutersConnect::Item's with added 'main links' sub items and 'supplemental links' sub items. You can view them as editorially put together news items.

Options:

   use_snep: Use editor Super NEPs. Defaults to false (just returns the latest ones).

   limit: Fetch a limited number of NEPs, defaults to $this->default_limit()

Usage:

 my @items = $reuters->fetch_packages( $channel );
 my @items = $reuters->fetch_packages( $channel->alias() , { options .. } );

fetch_package

Fetches a richer version of some specific NEPs (News Event Package). Despite the name of this method, you can actually specify multiple NEPs:

Usage:

  my @nep_items = $this->fetch_package($channel->alias(), [ $item1->id() , $item2->id() ] );

fetch_item_xdoc

Fetches one WebService::ReutersConnect::XMLDocument from Reuters, given the Item or the item ID.

This document is a NewsMLG2 document as specified here: http://reutersconnect.com/files/NewsML-G2_Quick_Reference_Guide.pdf

You can view a NewMLG2 document as a 'full view' of a simple WebService::ReutersConnect::Item (Simple News Item).

Implementing a full NewsMSG2 Object from such a document is out of the scope of this module. HOWEVER, for your convenience and enjoyement, the returned object comes with an already instantiated XML::LibXML::XPathContext object on which you can query things of interest.

You are also strongly encouraged to read the 'item' method section of http://reutersconnect.com/files/Reuters_Connect_Web_Services_Developer_Guide.pdf.

Options:

  item: An item

   OR

  guid: GUI of an ITEM
  channel: Combined with guid to get the freshest version of the news item.

   OR

  item_id: The the specific version of the Item by item ID.

   ------

  company_markup: 0 or 1 (default 0). If set, will markup the content with company name. See Reuters documentation.

Usage:

  my $xml_doc = $this->fetch_item_xdoc({  guid => $item->guid() , channel => $item->channel()->alias() });
  my $xml_doc = $this->fetch_item_xdoc({ item_id => $item->id() });
  my $xml_doc  = $this->fetch_item_xdoc( { item => $item_object } );

  print $xml_doc->toString(); ## Print the whole document.
  print $xml_doc->xml_xpath->findvalue('//rcx:description'); ## The default namespace for xpath is 'rcx'
  print $xml_doc->xml_xpath->findvalue('//rcx:headline');
  my ($body_node) = $xc->findnodes('//x:html/x:body'); ## Find the HTML content (in case of article).
  print $body_node->toString(); ## Print the whole html.

REUTERS_DEMOUSER

Returns the username for the demo account. This is exportable:

  use WebService::ReutersConnect qw/:demo/;
  print REUTERS_DEMOUSER;

REUTERS_DEMOPASSWORD

Returns the password for the demo account. This is exportable:

  use WebService::ReutersConnect qw/:demo/;
  print REUTERS_DEMOPASSWORD;

AUTHOR

Jerome Eteve, <jerome at eteve.net>

KNOWN ISSUES

This module is known to be correct, but not to be complete.

Some ReutersConnect method options and some objects properties might not be implemented.

Also, it lacks the preference methods and the OpenCalais method of the ReutersConnect API (for now).

Please file any feature you might be missing in the issue tracking system. See BUGS section.

BUGS

Please report any bugs or feature requests to bug-webservice-reutersconnect at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=WebService-ReutersConnect. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc WebService::ReutersConnect

You can also look for information at:

ACKNOWLEDGEMENTS

Thanks to C. Gevrey from Reuters for his guidance and inspiration in writing this module.

LICENSE AND COPYRIGHT

Copyright 2012 Jerome Eteve.

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.