The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

WWW::Search::HotBot - backend for searching www.hotbot.com

SYNOPSIS

  use WWW::Search;
  my $oSearch = new WWW::Search('HotBot');
  my $sQuery = WWW::Search::escape_query("+sushi restaurant +Columbus Ohio");
  $oSearch->native_query($sQuery);
  while (my $oResult = $oSearch->next_result())
    { print $oResult->url, "\n"; }

DESCRIPTION

This class is a HotBot specialization of WWW::Search. It handles making and interpreting HotBot searches http://www.hotbot.com.

This class exports no public interface; all interaction should be done through WWW::Search objects.

The default behavior is for HotBot to look for "any of" the query terms:

  $oSearch->native_query(escape_query('Dorothy Oz'));

If you want "all of", call native_query like this:

  $oSearch->native_query(escape_query('Dorothy Oz'), {'SM' => 'MC'});

If you want to send HotBot a boolean phrase, call native_query like this:

  $oSearch->native_query(escape_query('Oz AND Dorothy NOT Australia'), {'SM' => 'B'});

See below for other query-handling options.

OPTIONS

The following search options can be activated by sending a hash as the second argument to native_query().

Format / Treatment of Query Terms

The default is logical OR of all the query terms.

{'SM' => 'MC'}

"Must Contain": logical AND of all the query terms.

{'SM' => 'SC'}

"Should Contain": logical OR of all the query terms. This is the default.

{'SM' => 'B'}

"Boolean": the entire query is treated as a boolean expression with AND, OR, NOT, and parentheses.

{'SM' => 'name'}

The entire query is treated as a person's name.

{'SM' => 'phrase'}

The entire query is treated as a phrase.

{'SM' => 'title'}

The query is applied to the page title. (I assume the logical OR of the query terms will be applied to the page title.)

{'SM' => 'url'}

The query is assumed to be a URL, and the results will be pages that link to the query URL.

Restricting Search to a Date Range

The default is no date restrictions.

{'date' => 'within', 'DV' => 90}

Only return pages updated within 90 days of today. (Substitute any integer in place of 90.)

{'date' => 'range', 'DR' => 'newer', 'DY' => 97, 'DM' => 12, 'DD' => 25}

Only return pages updated after Christmas 1997. (Substitute any year, month, and day for 97, 12, 25.)

{'date' => 'range', 'DR' => 'older', 'DY' => 97, 'DM' => 12, 'DD' => 25}

Only return pages updated before Christmas 1997. (Substitute any year, month, and day for 97, 12, 25.)

Restricting Search to a Geographic Area

The default is no restriction to geographic area.

{'RD' => 'AN'}

Return pages from anywhere. This is the default.

{'RD' => 'DM', 'Domain' => 'microsoft.com, .cz'}

Restrict search to pages located in the listed domains. (Substitute any list of domain substrings.)

{'RD' => 'RG', 'RG' => '.com'}

Restrict search to North American commercial web sites.

{'RD' => 'RG', 'RG' => '.edu'}

Restrict search to North American educational web sites.

{'RD' => 'RG', 'RG' => '.gov'}

Restrict search to United Stated Government web sites.

{'RD' => 'RG', 'RG' => '.mil'}

Restrict search to United States military commercial web sites.

{'RD' => 'RG', 'RG' => '.net'}

Restrict search to North American '.net' web sites.

{'RD' => 'RG', 'RG' => '.org'}

Restrict search to North American organizational web sites.

{'RD' => 'RG', 'RG' => 'NA'}

"North America": Restrict search to all of the above types of web sites.

{'RD' => 'RG', 'RG' => 'AF'}

Restrict search to web sites in Africa.

{'RD' => 'RG', 'RG' => 'AS'}

Restrict search to web sites in India and Asia.

{'RD' => 'RG', 'RG' => 'CA'}

Restrict search to web sites in Central America.

{'RD' => 'RG', 'RG' => 'DU'}

Restrict search to web sites in Oceania.

{'RD' => 'RG', 'RG' => 'EU'}

Restrict search to web sites in Europe.

{'RD' => 'RG', 'RG' => 'ME'}

Restrict search to web sites in the Middle East.

{'RD' => 'RG', 'RG' => 'SE'}

Restrict search to web sites in Southeast Asia.

Requesting Certain Multimedia Data Types

The default is not specifically requesting any multimedia types (presumably, this will NOT restrict the search to NON-multimedia pages).

{'FAC' => 1}

Return pages which contain Adobe Acrobat PDF data.

{'FAX' => 1}

Return pages which contain ActiveX.

{'FJA' => 1}

Return pages which contain Java.

{'FJS' => 1}

Return pages which contain JavaScript.

{'FRA' => 1}

Return pages which contain audio.

{'FSU' => 1, 'FS' => '.txt, .doc'}

Return pages which have one of the listed extensions. (Substitute any list of DOS-like file extensions.)

{'FSW' => 1}

Return pages which contain ShockWave.

{'FVI' => 1}

Return pages which contain images.

{'FVR' => 1}

Return pages which contain VRML.

{'FVS' => 1}

Return pages which contain VB Script.

{'FVV' => 1}

Return pages which contain video.

Requesting Pages at Certain Depths on Website

The default is pages at any level on their website.

{'PS'=>'A'}

Return pages at any level on their website. This is the default.

{'PS' => 'D', 'D' => 3 }

Return pages within 3 links of "top" page of their website. (Substitute any integer in place of 3.)

{'PS' => 'F'}

Only return pages that are the "top" page of their website.

SEE ALSO

To make new back-ends, see WWW::Search.

CAVEATS

When www.hotbot.com reports a "Mirror" URL, WWW::Search::HotBot ignores it. Therefore, the number of URLs returned by WWW::Search::HotBot might not agree with the value returned in approximate_result_count.

BUGS

Please tell the author if you find any!

TESTING

This module adheres to the WWW::Search test suite mechanism. See $TEST_CASES below.

AUTHOR

As of 1998-02-02, WWW::Search::HotBot is maintained by Martin Thurn (MartinThurn@iname.com).

WWW::Search::HotBot was originally written by Wm. L. Scheding, based on WWW::Search::AltaVista.

LEGALESE

THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

VERSION HISTORY

If it''s not listed here, then it wasn''t a meaningful nor released revision.

2.05, 1999-10-05

now uses hash_to_cgi_string(); new test cases

2.03, 1999-09-28

BUGFIX: was missing the "Next page" link sometimes.

2.02, 1999-08-17

Now is able to parse "URL-only" format (i.e. {'DE' => 0}) and "brief description" format (i.e. {'DE' => 1}) if the user so desires.

1.34, 1999-07-01

New test cases.

1.32, 1999-06-20

Now unescapes the URLs before returning them.

1.31, 1999-06-11

www.hotbot.com changed their output format ever so slightly. (Thanks to Jim jsmyser@bigfoot.com for pointing it out)

1.30, 1999-04-12

BUG FIX: results for domain-limited search were not parsed. (Thanks to Christopher York yorkc@ccwf.cc.utexas.edu for pointing it out)

1.29, 1999-02-22

www.hotbot.com changed their output format. (Thanks to Tim Chklovski timc@mit.edu for pointing it out)

1.27, 1998-11-06

HotBot changed their output format(?). HotBot.pm now uses hotbot.com's text-only search results format. Minor documentation changes.

1.25, 1998-09-11

HotBot changed their output format ever so slightly. Documentation added for all known HotBot query options!

1.23

Better documentation for boolean queries. (Thanks to Jason Titus jason_titus@odsnet.com)

1.22

www.hotbot.com changed their output format.

1.21

www.hotbot.com changed their output format.

1.17

www.hotbot.com changed their search script location and output format on 1998-05-21. Also, as many as 6 fields of each SearchResult are now filled in.

1.13

Fixed the maximum_to_retrieve off-by-one problem. Updated test cases.

1.12

www.hotbot.com does not do truncation. Therefore, if the query contains truncation characters (i.e. '*' at end of words), they are simply deleted before the query is sent to www.hotbot.com.

1.11, 1998-02-05

Fixed and revamped by Martin Thurn.