The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

WWW::Search::Scraper::Request - Canonical form for Scraper requests

SYNOPSIS

    use WWW::Search::Scraper::Request;
    #
    $request = new WWW::Search::Scraper::Request( [$nativeQuery] [. {'fieldName' => $fieldValue, . . . }] );
    $request->$fieldName($fieldValue);
    .
    .
    .
    
    # or based on and attached to a particular Scraper engine.
    my $scraper = new WWW::Search::Scraper('engineName');
    $request = new WWW::Search::Scraper::Request( $scraper [, $nativeQuery]   [, {'fieldName' => $fieldValue, . . . }] );
    $request->$fieldName($fieldValue);
    .
    .
    .

DESCRIPTION

See ScraperPOD for a description of how the Request class fits into Scraper.

setQuery

Set the canonical "query" field value. The Request class converts that to the appropriate native query string and field according to the associated Scraper engine. You may also set the query string in the new() method.

Custom Request sub-classes may also define other canonical fields. For instance, the Jobs Request sub-class defines canonical fields 'skills', ,'locations' and 'payrate'. When you set these values, this Request sub-class translates each into the appropriate values and field names for the whatever Scraper engine you are using.

Callback Functions

postSelect

postSelect() is a callback function that may be called by the Scraper module to help it decide if the response it has received will actually qualify against this request. postSelect() should return true if the response matches the request, false if not.

The parameters postSelect() will receive are

$request

A reference to itself, of course.

$scraper

A reference to the Scraper module under which all of this is happening. You probably won't need this, but there it is.

$response

The Scraper::Request object that is the actual response. This is probably (or should be) an extension to a sub-class appropriate to your Scraper::Request sub-class.

$alreadyDone

The Scraper module will tell you which fields, by name, that it has already has (or will) handle on it's own. This parameter may be a string holding a field name, or a reference to an array of field names.

Scraper::Request contains a method for helping you vector on $alreadyDone. The method

    $request->alreadyDone('fieldName', $alreadyDone)

will return true if the field 'fieldName' is in $alreadyDone.

TRANSLATIONS

The Scraper modules that do table driven field translations (from canonical requests to native requests) will have files included in their package representing the translation table in Storable format. The names of these files are <ScraperModuleName>.<requestType>.<canonicalFieldName>. E.G., Brainpower.pm owns a translation table for the 'locations' field of the canonical Request::Job module; it is named Brainpower.Job.locations .

The Scraper module will locate the translation file, when required, by searching the @INC path-search until it is found (the same path-search Perl uses to locate Perl modules.)

set<fieldName>Translation()

The methods set<fieldName>Translations() can be used to help maintain these translation files. For instance, setLocationsTranslation('canonical', 'native') will establish a translation from 'canonical' to 'native' for the 'locations' request field.

    setLocationsTranslation('CA-San Jose', 5);       # CA-San Jose => '5'
    setLocationsTranslation('CA-San Jose', [5,6]);   # CA-San Jose => '5' + '6'
    

If you have used this method to upgrade your translations, then a new upgrade of WWW::Search::Scraper will probably over-write your tranlation file(s), so watch out for that! Back up your translation files before upgrading WWW::Search::Scraper!

AUTHOR

WWW::Search::Scraper::Request is written and maintained by Glenn Wood, http://search.cpan.org/search?mode=author&query=GLENNWOOD.

DESCRIPTION

Scraper automatically generates a "Request" class for each scraper engine. It does this by parsing the "scraperFrame" to identify all the field names fetched by the scraper. It defines a get/set method for each of these fields, each named the same as the field name found in the "scraperFrame".

Optionally, you may write your own Request class and declare that as the Request class for your queries. This is useful for defining a common Request class to a set of scraper engines (all auction sites, for instance). See WWW::Search::Scraper::Request::Auction for an example of this type of Request class.

METHODS

$fieldName

As mentioned, Request will automatically define get/set methods for each of the fields in the "scraperFrame". For instance, for a field named "postDate", you can get the field value with

    $response->postDate();

You may also set the value of the postDate, but that would be kind of silly, wouldn't it?

GetFieldNames

A reference to a hash is returned listing all the field names in this response. The keys of the hash are the field names, while the values are 1, 2, or 3. A value of 1 means the value comes from the result page; 2 means the value comes from the detail page; 3 means the value is in both pages.

SkipDetailPage

A well-constructed Request class (as Scraper auto-generates) implements a lazy-access method for each of the fields that come from the detail page. This means the detail page is fetched only if you ask for that field value. The SkipDetailPage() method controls whether the detail page will be fetched or not. If you set it to 1, then the detail page is never fetched (detail dependent fields return undef). Set to 2 to read the detail page on demand. Set to 3 to read the detail page for fields that are only on the detail page, and don't fetch the detail page, but return the results page value, for fields that appear on both pages.

SkipDetailPage defaults to 2.

ScrapeDetailPage

Forces the detail page to be read and scraped right now.

GetFieldValues

Returns all field values of the response in a hash table (by reference). Like GetFieldNames(), the keys are the field names, but in this case the values are the field values of each field.

GetFieldTitles

Returns a reference to a hash table containing titles for each field (which might be different than the field names).

AUTHOR

WWW::Search::Scraper::Request::Scraper is written and maintained by Glenn Wood, http://search.cpan.org/search?mode=author&query=GLENNWOOD.

COPYRIGHT

Copyright (c) 2001 Glenn Wood All rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.