The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

WWW::Search::HotBot - class for searching HotBot

SYNOPSIS

  use WWW::Search;
  my $oSearch = new WWW::Search('HotBot');
  my $sQuery = WWW::Search::escape_query("+sushi restaurant +Columbus Ohio");
  $oSearch->native_query($sQuery);
  while (my $oResult = $oSearch->next_result())
    { print $oResult->url, "\n"; }

DESCRIPTION

This class is a HotBot specialization of WWW::Search. It handles making and interpreting HotBot searches http://www.hotbot.com.

This class exports no public interface; all interaction should be done through WWW::Search objects.

The default behavior is for HotBot to look for "any of" the query terms. If you want "all of", call native_query like this:

  $oSearch->native_query(escape_query('Dorothy Toto Oz'), {'SM' => 'MC'});

If you want to send HotBot a boolean phrase, call native_query like this:

  $oSearch->native_query(escape_query('Oz AND Dorothy AND toto NOT Australia'), {'SM' => 'B'});

See below for other query-handling options.

OPTIONS

The following search options can be activated by sending a hash as the second argument to native_query().

Format / Treatment of Query Terms

The default is logical OR of all the query terms.

{'SM' => 'MC'}

"Must Contain": logical AND of all the query terms.

{'SM' => 'SC'}

"Should Contain": logical OR of all the query terms. This is the default.

{'SM' => 'B'}

"Boolean": the entire query is treated as a boolean expression with AND, OR, NOT, and parentheses.

{'SM' => 'name'}

The entire query is treated as a person's name.

{'SM' => 'phrase'}

The entire query is treated as a phrase.

{'SM' => 'title'}

The query is applied to the page title. (I assume the logical OR of the query terms will be applied to the page title.)

{'SM' => 'url'}

The query is assumed to be a URL, and the results will be pages that link to the query URL.

Restricting Search to a Date Range

The default is no date restrictions.

{'date' => 'within', 'DV' => 90}

Only return pages updated within 90 days of today. (Substitute any integer in place of 90.)

{'date' => 'range', 'DR' => 'After', 'DY' => 97, 'DM' => 12, 'DD' => 25}

Only return pages updated after Christmas 1997. (Substitute any year, month, and day for 97, 12, 25.)

{'date' => 'range', 'DR' => 'Before', 'DY' => 97, 'DM' => 12, 'DD' => 25}

Only return pages updated before Christmas 1997. (Substitute any year, month, and day for 97, 12, 25.)

Restricting Search to a Geographic Area

The default is no restriction to geographic area.

{'RD' => 'AN'}

Return pages from anywhere. This is the default.

{'RD' => 'DM', 'Domain' => 'microsoft.com, .cz'}

Restrict search to pages located in the listed domains. (Substitute any list of domain substrings.)

{'RD' => 'RG', 'RG' => '.com'}

Restrict search to North American commercial web sites.

{'RD' => 'RG', 'RG' => '.edu'}

Restrict search to North American educational web sites.

{'RD' => 'RG', 'RG' => '.gov'}

Restrict search to United Stated Government web sites.

{'RD' => 'RG', 'RG' => '.mil'}

Restrict search to United States military commercial web sites.

{'RD' => 'RG', 'RG' => '.net'}

Restrict search to North American '.net' web sites.

{'RD' => 'RG', 'RG' => '.org'}

Restrict search to North American organizational web sites.

{'RD' => 'RG', 'RG' => 'NA'}

"North America": Restrict search to all of the above types of web sites.

{'RD' => 'RG', 'RG' => 'AF'}

Restrict search to web sites in Africa.

{'RD' => 'RG', 'RG' => 'AS'}

Restrict search to web sites in India and Asia.

{'RD' => 'RG', 'RG' => 'CA'}

Restrict search to web sites in Central America.

{'RD' => 'RG', 'RG' => 'DU'}

Restrict search to web sites in Oceania.

{'RD' => 'RG', 'RG' => 'EU'}

Restrict search to web sites in Europe.

{'RD' => 'RG', 'RG' => 'ME'}

Restrict search to web sites in the Middle East.

{'RD' => 'RG', 'RG' => 'SE'}

Restrict search to web sites in Southeast Asia.

Requesting Certain Multimedia Data Types

The default is not specifically requesting any multimedia types (presumably, this will NOT restrict the search to NON-multimedia pages).

{'FAC' => 1}

Return pages which contain Adobe Acrobat PDF data.

{'FAX' => 1}

Return pages which contain ActiveX.

{'FJA' => 1}

Return pages which contain Java.

{'FJS' => 1}

Return pages which contain JavaScript.

{'FRA' => 1}

Return pages which contain audio.

{'FSU' => 1, 'FS' => '.txt, .doc'}

Return pages which have one of the listed extensions. (Substitute any list of DOS-like file extensions.)

{'FSW' => 1}

Return pages which contain ShockWave.

{'FVI' => 1}

Return pages which contain images.

{'FVR' => 1}

Return pages which contain VRML.

{'FVS' => 1}

Return pages which contain VB Script.

{'FVV' => 1}

Return pages which contain video.

Requesting Pages at Certain Depths on Website

The default is pages at any level on their website.

{'PS'=>'A'}

Return pages at any level on their website. This is the default.

{'PS' => 'D', 'D' => 3 }

Return pages within 3 links of "top" page of their website. (Substitute any integer in place of 3.)

{'PS' => 'F'}

Only return pages that are the "top" page of their website.

SEE ALSO

To make new back-ends, see WWW::Search.

HOW DOES IT WORK?

native_setup_search is called (from WWW::Search::setup_search) before we do anything. It initializes our private variables (which all begin with underscore) and sets up a URL to the first results page in {_next_url}.

native_retrieve_some is called (from WWW::Search::retrieve_some) whenever more hits are needed. It calls WWW::Search::http_request to fetch the page specified by {_next_url}. It then parses this page, appending any search hits it finds to {cache}. If it finds a ''next'' button in the text, it sets {_next_url} to point to the page for the next set of results, otherwise it sets it to undef to indicate we''re done.

CAVEATS

When HotBot reports a "Mirror" URL, WWW::Search::HotBot ignores it.

BUGS

Please tell the author if you find any!

TESTING

This module adheres to the WWW::Search test suite mechanism.

  Test cases (results as of 1998-08-27):
  '+mrfglbqnx +NoSuchWord'       ---   no URLs
  '"Christie Abbott"'            ---    9 URLs on one page
  'LSAM'                         ---  184 URLs on two pages

AUTHOR

As of 1998-02-02, WWW::Search::HotBot is maintained by Martin Thurn (MartinThurn@iname.com).

WWW::Search::HotBot was originally written by Wm. L. Scheding, based on WWW::Search::AltaVista.

LEGALESE

THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

VERSION HISTORY

If it''s not listed here, then it wasn''t a meaningful nor released revision.

1.25 1998-09-11

HotBot changed their output format ever so slightly. Documentation added for all possible HotBot query options!

1.23

Better documentation for boolean queries. (Thanks to Jason Titus jason_titus@odsnet.com)

1.22

HotBot changed their output format.

1.21

HotBot changed their output format.

1.20

\n changed to \012 for MacPerl compatibility

1.17

HotBot changed their search script location and output format on 1998-05-21. Also, as many as 6 fields of each SearchResult are now filled in.

1.13

Fixed the maximum_to_retrieve off-by-one problem. Updated test cases.

1.12

HotBot does not do truncation. Therefore, if the query contains truncation characters (i.e. '*' at end of words), they are simply deleted before the query is sent to HotBot.

1.11 1998-02-05

Fixed and revamped by Martin Thurn.