The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

WebSource - a general data wrapping tool particularly well suited for online data (but what data in not online in some way today ;) )

DESCRIPTION

WebSource gives a general and normalized framework way to access data made available via the web. An access to subparts of the Web is made by defining a task. This task is built by composing query building, extraction, fetching and filtering subtasks.

SYNOPSIS

  $source = WebSource->new(wsd => $description);
  @results = $source->query($query);
or
  $result = $source->set_query($query);
  while($result = $source->next_result()) {
    ...
  }

ABSTRACT

WebSource originally was a generic wrapper around a Web Source. Given an XML description of a source it allows to query the source and retreive its results. The format of the query and the result remain source dependant however.

It is now configurable enough allow to do complex tasks on the web : such as fetching, extracting, filtering data one the Web. Each complex task is described by an XML task description file (WebSource description). This task is decomposed into simple subtasks of different flavors.

Existing subtask flavors are : - extract input an XML::LibXML::Document output an XML::LibXML::Node Applys an Xpath on the document and returns the set of nodes - fetch input a URL (or XML::LibXML::Node containing a url) output an XML::LibXML::Document - format input an XML::Document output a string - filter input anything output anything (but not all) - external This type of subtask uses an external perl module as a task. This allows to define highly configurable tasks. input depends on external module output depends on external module - meta-tag input anything output anything (with updated meta-data)

METHODS

$source = WebSource->new(wsd => $wsd);

Create a new WebSource object working with the given a WebSource description

The following named paramters can be given :

wsd

Use a generic engine with the given source description file

max_results

Do not output more than max_results

$source->push($item);

Pass the initial data to the first subtask

$source->query($query);

Build a query %hash for the given parameters and push it in

$source->set_max_results($count);

Set the maximum number of results to output to $count

$source->next_result();

Returns the following result for the task

$source->parameters;

Returns a has of the initial tasks parameters

$source->option_spec;

Returns the spec of the options translated for Getopt::Mixed

$source->set_option($opt,$val)

Sets source specific option $opt to value $val

$source->apply_imports

Handles node of type <ws:import href="" /> by inserting nodes from the wsd file referenced by href into (imported document) into the current wsd document (target document). A node is inserted from the imported document into the target document only if a node with the same name does not exist in the target document.

$source->apply_options

Handles node of type <ws:attribute name="aname" value="oname" /> by adding and attribut name aname with the value of the option named oname to the parent node. The ws:attribute node is then removed.

SEE ALSO

ws-query, WebSource::Extract, WebSource::Fetch, WebSource::Filter, etc.

2 POD Errors

The following errors were encountered while parsing the POD:

Around line 362:

'=item' outside of any '=over'

Around line 524:

You forgot a '=back' before '=head1'