Data::Pipeline::Machine - easy-to-use machine building
package Data::Pipeline::AdapterX::GoogleScholar; use Data::Pipeline::Machine; use Data::Pipeline qw( FetchPage Regex StringReplace UrlBuilder ); pipeline( FetchPage( cut_start => '<p class=g>', cut_end => '</table>', split => '<p class=g>', url => UrlBuilder( base => 'http://scholar.google.com/scholar', query => { q => Option( q => ( default => 'biology' ) ), hl => 'en', lr => '', scoring => 'r', as_ylo => Option( year => ( default => '2007' ) ), num => 100, safe => 'off' } ), ), Rename( copies => { content => 'description', content => 'title' }, renames => { content => 'link' } ), Regex( rules => [ title => sub { s/^<span class="w">.*?<a.+?>(.+?)/$1/gs }, title => sub { s/(.+?)</a.+/$1/gs }, title => sub { s/…//gs }, link => sub { s{.+?http://(.+?)".+}{http://$1}gs }, title => sub { s/<.+?>//gs }, title => sub { s/ //gs }, description => sub { s{+?<span class="a">.+?- (.+?) -.+}{$1}gs }, description => sub { s{<.+?>}{}gs } ] ) ); # pipeline
use Data::Pipeline qw( Pipeline GoogleScholar CSV ); my $pipe = Pipeline( GoogleScholar, CSV( column_names => [qw(title link)] ) ); $pipe -> from( q => 'physics' ) -> to( \*STDOUT );
This package makes it easy to construct collections of pipelines that together act as an action or an adapter.
Several constructors are exported automatically by the package.
This constructs an object that will supply an optional argument for the transformation. A default value can be supplied in the options.
The value is pulled from the argument $name given when calling from on the machine. In the example in the synopsis, the Option( q => ... ) in the machine definition pulls its value from the q value supplied when the machine is used in a pipeline and the pipeline is instantiated. Likewise, the Option( year => ... ) supplies its default value because no year is given.
from
q
This defines a pipeline with an optional name.
If the name is not given, it is assumed to be 'finally'. Only one pipeline should be defined without an explicit name. The pipeline named 'finally' is the default pipeline to start with when constructing an unnamed pipeline using from or transform.
transform
Instead of defining a pipeline as the similar method would do if imported from Data::Pipeline, this allows you to call another pipeline in the machine with arguments.
James Smith
Copyright (c) 2008 Texas A&M University.
This library is free software, you can redistribute it and/or modify it under the same terms as Perl itself.
To install Data::Pipeline, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Data::Pipeline
CPAN shell
perl -MCPAN -e shell install Data::Pipeline
For more information on module installation, please visit the detailed CPAN module installation guide.