WebService::ReutersConnect - Use the ReutersConnect Live News API
Version 0.05
This module depends only on debian distributed packages. If you're using a debian based system, do
$ sudo apt-get install perl-modules libtest-fatal-perl perl-base libdbd-sqlite3-perl libdbix-class-perl libdatetime-perl \ libdatetime-format-iso8601-perl libdevel-repl-perl libfile-sharedir-perl libwww-perl liblog-log4perl-perl libmoose-perl \ libterm-readkey-perl liburi-perl libxml-libxml-perl $ sudo cpan -i WebService::ReutersConnect ## or anything you like.
Use your favorite Perl package installation method.
$ sudo cpan -i WebService::ReutersConnect ## Should do the job on *NIX systems
This module allows access to Reuters Connect APIs as described here:
http://reutersconnect.com/
It is based on the REST APIs.
You WILL have to contact reuters to get yourself some API credentials if you want to use this module. This is out of scope of this distribution. However, some demo credentials are supplied by this module for your convenience.
By the way, those demo credential change from time to time, so have a look at http://reutersconnect.com/docs/Demo_Login_Page if you get authentication errors.
For your convenience, this module will try scraping the demo credentials from this page if you don't feel like looking at it yourself :)
This module provides a 'reutersconnect' shell so you can interactively play with the API.
Example:
$ reutersconnect 2013/03/20 16:45:05 Will try to use the demo account. Use '/usr/local/bin/reutersconnect -u <username>' to login as a specific user 2013/03/20 16:45:05 No username/password given. Trying to scrape the demo ones 2013/03/20 16:45:07 Found 'demo.user/vYkLo4Lv' credentials 2013/03/20 16:45:08 Granted access to 6 channels 2013/03/20 16:45:08 Starting shell. ReutersConnect object is '$rc' demo.user@reutersconnect.com> map{ $_->alias().' '.$_->description()."\n" } $rc->channels() FES376 US Online Report Top News QTZ240 NVO STK567 Reuters World Service mkc191 Unique-Product-For-User-26440 txb889 Unique-Product-For-Account-26439 xHO143 Italy Picture Service demo.user@reutersconnect.com> [CTRL-D] to quit
See the rest of this module doc and WebService::ReutersConnect::Channel and WebService::ReutersConnect::Item for a detailed API description.
use WebService::ReutersConnect qw/:demo/; my $reuters = WebService::ReutersConnect->new({ username => REUTERS_DEMOUSER, password => REUTERS_DEMOPASSWORD }); my @channels = $reuters->channels(); my @items = $reuters->items( $channels[0] ); my $full_xml_doc = $reuters->fetch_item_xdoc({ item => $items[0] });
Additionally, a very basic demo page scraping mechanism is provided, so you can build an API object without any credential at all if you feel lucky:
my $reuters = WebService::ReutersConnect->new(); my @channels = $reuters->channels();
Here are some example of usage to get you started quickly:
my $res = $reuters->search({ q => 'headline:britain' , sort => 'date' }); say("Size: ".$res->size()); say("Num Found: ".$res->num_found()); say("Start: ".$res->start()); foreach my $item ( @{ $res->items() } ){ say($item->headline()); }
my @items = $reuters->search({ limit => 5 , media_types => [ 'P' ] }); foreach my $item ( @items ){ print "\n".$item->date_created().' : '.$item->headline()."\n\n"; print " CLICK: ".$item->preview_url()."\n\n"; }
my $xdoc = $reuters->item({ guid => $item->guid() , channel => $item->channel_alias() }); say $xdoc->asString(); ## That will help you :) my ($body_node) = $xc->findnodes('//x:html/x:body'); ## Find the HTML content (in case of article). say $body_node->toString(1); ## Print the whole html. ## You can also print only the content of the body: my @body_parts = $xdoc->get_html_body(); map { say $_->toString(1) } @body_parts; ## Find the subjects: my @subjects = $xdoc->get_subjects(); foreach my $subject ( @subjects ){ say "This is about: ".$subject->name_main(); }
If you supply a ReutersConnect username and password, this module will fetch an authentication token from the service and use it in all subsequent requests.
The basic usage involves giving some classical username and password as demonstrated in the synopsys section.
You can access the authentication token: $this->authToken() for diagnostic and external storage.
You can also build an instance of this module using an authentication token that you stored somewhere:
my $reuters = WebService::ReutersConnect->new( { authToken => $authToken } );
Beware that ReutersConnect authentication tokens are only valid for 24 hours. It is advised to effectively renew the authentication token more often to avoid any expiration issue. For instance every 12 hours.
This module does NOT contain any mecanism to renew authentication tokens at regular intervals. If you keep long standing instances of this module, it's your responsability to renew them regularly enough.
However, for very simple cases, where there's no concurrent access to the token storage, or when you have only one longstanding instance, the options refresh_token and after_refresh_token can be useful.
Reuters provides a demo account so you can try out this API without holding an account with them. The demo credentials live on this page http://reutersconnect.com/docs/Demo_Login_Page
They do change every month, but this module provides a very basic method to scrape them if no username/password is given in the constructor. See SYNOPSIS section.
This module uses Log::Log4perl and is automatically initialized to the ERROR level. Feel free to initialize Log::Log4perl to your taste in your application.
Additionally, there's is the debug option that will output very verbose (HTTP traffic) at the INFO level.
Most attributes are read only and have a default value. Set them at construction time if necessary.
Get/Set the ReutersConnect entry URL. Default should work.
Get/Set the ReutersConnect login entry URL. Default should work.
ReutersConnect API username.
ReutersConnect API password.
head2 authToken
ReutersConnect authentication token. If not set, this will try to get a new one using the username/password
Option. When true, the module will attempt ONCE fetching a fresh authentication token. from ReutersConnect in case the token held is invalid or expired.
Of course, turning that on only makes sense if you give the username and password at instanciation time.
If you want to be notified of the new token in your client code, you can register a callback:
This is a callback called after this module has fetched a new authentication token from ReutersConnect. It's normally used in combination with refresh_token.
Usage:
my $reuters = WebService::ReutersConnect->new({ username => ..., password => ...., on_refresh_token => sub{ my ($new_token) = @_; ## Store new token somewhere } });
A LWP::UserAgent. There's a default instance but feel free to replace it with your application one.
Swicthes on/off extra debugging (Specially HTTP Requests and Responses).
DateTime At which this instance was created.
Quick and very dirty method to scrape demo credentials from http://reutersconnect.com/docs/Demo_Login_Page This is used automatically when no credential at all are provided in the constructor. You shouldn't have to use that yourself. Returns 1 for success, 0 for failure.
unless( $this->scrape_demo_credentials() ){ ## Woopsy }
Alias for fetch_channels
Alias for fetch_items
Alias for fetch_packages
Alias for fetch_search
Alias for fetch_olr
Alias for fetch_item_xdoc.
Fetch the WebService::ReutersConnect::Channel's according to the given options (or not).
my @all_channels = $this->fetch_channels(); my ( $channel ) = $this->fetch_channels({ channel => [ '56HD' ] }); ## Filter on channel alias(s) my @specific_channels = $this->fetch_channels({ channel => [ '567', '7654' ,... ] }); ## Filter on channel Category(s) ID(s) my @channels = $this->fetch_channels({ channelCategory => [ 'JDJD' , 'JDJD' ] });
Fetch WebService::ReutersConnect::Item news item from Reuters Connect. This is the core method. You MUST specify ONE channel (Get the list using the fetch_channels method). You can give indiferently a channel or a channel alias.
This method returns REAL TIME items.
my @items = $this->fetch_items($channel->alias, { %options });
Options:
media_types: An Array of media types to compose from the following options: T (text), P (pictures), V (video), G (graphics) and C (composite) date_from: YYYY-MM-DD or DateTime object. Defaults to now - 24h. This is INCLUSIVE date_to: idem but cannot be specified without date_from. Defaults to now. Not that this date is NOT INCLUSIVE limit: Number of items to fetch. Default to $this->default_limit() sort: Sort by 'date' (newest first) or by 'score' (more relevant first).
Search for WebService::ReutersConnect::Item's in all Reuters news (from the channels you have access to).
Items found through this method can suffer from a slight delay compared to the live 'items' method.
q: Free Text Style query string. See search method in http://reutersconnect.com/files/Reuters_Connect_Web_Services_Developer_Guide.pdf for an extended specification channels : An Array Ref of restriction Channels (Or channel Aliases) categories : An Array Ref of restriction Categories (Or catecogy IDs) media_types: An Array of media types to compose from the following options: T (text), P (pictures), V (video), G (graphics) and C (composite) limit: Number of items to fetch. Default to $this->default_limit() sort: Sort by 'date' (newest first) or by 'score' (more relevant first).
my @items = $this->fetch_search(); ## Only videos my @items = $this->fetch_search({ media_types => [ 'V' ] }); ## Only pictures or videos about Britney Spears my @items = $this->fetch_search({ q => 'britney spears' , media_types => [ 'P' , 'V' ] }); ## Additionally, if you want a L<WebService::ReutersConnect::ResultSet>, use the scalar version of this method: my $res = $this->fetch_search({ media_types => [ 'V' ] }); print $res->num_found().' results in total'; print $res->size().' results effectively returned (because of limit)'; print $res->start().' offset in the total result space'; my @items = @{$res->items()};
Fetches OnLine Reports (SNI, NEPs, SNEPs, .. ) from the Channels you have access to, You can optionally filter by channel(s).
channels: An array ref of channel restriction.
Fetches the edited NEPs (News Event Package) from a specific Reuters Channel. NEPs comes as WebService::ReutersConnect::Item's with added 'main links' sub items and 'supplemental links' sub items. You can view them as editorially put together news items.
use_snep: Use editor Super NEPs. Defaults to false (just returns the latest ones). limit: Fetch a limited number of NEPs, defaults to $this->default_limit()
my @items = $reuters->fetch_packages( $channel ); my @items = $reuters->fetch_packages( $channel->alias() , { options .. } );
Fetches a richer version of some specific NEPs (News Event Package). Despite the name of this method, you can actually specify multiple NEPs:
my @nep_items = $this->fetch_package($channel->alias(), [ $item1->id() , $item2->id() ] );
Fetches one WebService::ReutersConnect::XMLDocument from Reuters, given the Item or the item ID.
This document is a NewsMLG2 document as specified here: http://reutersconnect.com/files/NewsML-G2_Quick_Reference_Guide.pdf
You can view a NewMLG2 document as a 'full view' of a simple WebService::ReutersConnect::Item (Simple News Item).
Implementing a full NewsMSG2 Object from such a document is out of the scope of this module. HOWEVER, for your convenience and enjoyement, the returned object comes with an already instantiated XML::LibXML::XPathContext object on which you can query things of interest.
You are also strongly encouraged to read the 'item' method section of http://reutersconnect.com/files/Reuters_Connect_Web_Services_Developer_Guide.pdf.
item: An item OR guid: GUI of an ITEM channel: Combined with guid to get the freshest version of the news item. OR item_id: The the specific version of the Item by item ID. ------ company_markup: 0 or 1 (default 0). If set, will markup the content with company name. See Reuters documentation.
my $xml_doc = $this->fetch_item_xdoc({ guid => $item->guid() , channel => $item->channel()->alias() }); my $xml_doc = $this->fetch_item_xdoc({ item_id => $item->id() }); my $xml_doc = $this->fetch_item_xdoc( { item => $item_object } ); print $xml_doc->toString(); ## Print the whole document. print $xml_doc->xml_xpath->findvalue('//rcx:description'); ## The default namespace for xpath is 'rcx' print $xml_doc->xml_xpath->findvalue('//rcx:headline'); my ($body_node) = $xc->findnodes('//x:html/x:body'); ## Find the HTML content (in case of article). print $body_node->toString(); ## Print the whole html.
Returns the username for the demo account. This is exportable:
use WebService::ReutersConnect qw/:demo/; print REUTERS_DEMOUSER;
Returns the password for the demo account. This is exportable:
use WebService::ReutersConnect qw/:demo/; print REUTERS_DEMOPASSWORD;
Jerome Eteve, <jerome at eteve.net>
<jerome at eteve.net>
This module is known to be correct, but not to be complete.
Some ReutersConnect method options and some objects properties might not be implemented.
Also, it lacks the preference methods and the OpenCalais method of the ReutersConnect API (for now).
Please file any feature you might be missing in the issue tracking system. See BUGS section.
Please report any bugs or feature requests to bug-webservice-reutersconnect at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=WebService-ReutersConnect. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
bug-webservice-reutersconnect at rt.cpan.org
You can find documentation for this module with the perldoc command.
perldoc WebService::ReutersConnect
You can also look for information at:
RT: CPAN's request tracker (report bugs here)
http://rt.cpan.org/NoAuth/Bugs.html?Dist=WebService-ReutersConnect
AnnoCPAN: Annotated CPAN documentation
http://annocpan.org/dist/WebService-ReutersConnect
CPAN Ratings
http://cpanratings.perl.org/d/WebService-ReutersConnect
Search CPAN
http://search.cpan.org/dist/WebService-ReutersConnect/
Thanks to C. Gevrey from Reuters for his guidance and inspiration in writing this module.
Copyright 2012 Jerome Eteve.
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See http://dev.perl.org/licenses/ for more information.
To install WebService::ReutersConnect, copy and paste the appropriate command in to your terminal.
cpanm
cpanm WebService::ReutersConnect
CPAN shell
perl -MCPAN -e shell install WebService::ReutersConnect
For more information on module installation, please visit the detailed CPAN module installation guide.