WWW::Search::HotBot - class for searching HotBot
use WWW::Search; my $oSearch = new WWW::Search('HotBot'); my $sQuery = WWW::Search::escape_query("+sushi restaurant +Columbus Ohio"); $oSearch->native_query($sQuery); while (my $oResult = $oSearch->next_result()) { print $oResult->url, "\n"; }
This class is a HotBot specialization of WWW::Search. It handles making and interpreting HotBot searches http://www.hotbot.com.
This class exports no public interface; all interaction should be done through WWW::Search objects.
The default behavior is for HotBot to look for "any of" the query terms. If you want "all of", call native_query like this:
$oSearch->native_query(escape_query('Dorothy Toto Oz'), {'SM' => 'MC'});
If you want to send HotBot a boolean phrase, call native_query like this:
$oSearch->native_query(escape_query('Oz AND Dorothy AND toto NOT Australia'), {'SM' => 'B'});
See below for other query-handling options.
The following search options can be activated by sending a hash as the second argument to native_query().
The default is logical OR of all the query terms.
"Must Contain": logical AND of all the query terms.
"Should Contain": logical OR of all the query terms. This is the default.
"Boolean": the entire query is treated as a boolean expression with AND, OR, NOT, and parentheses.
The entire query is treated as a person's name.
The entire query is treated as a phrase.
The query is applied to the page title. (I assume the logical OR of the query terms will be applied to the page title.)
The query is assumed to be a URL, and the results will be pages that link to the query URL.
The default is no date restrictions.
Only return pages updated within 90 days of today. (Substitute any integer in place of 90.)
Only return pages updated after Christmas 1997. (Substitute any year, month, and day for 97, 12, 25.)
Only return pages updated before Christmas 1997. (Substitute any year, month, and day for 97, 12, 25.)
The default is no restriction to geographic area.
Return pages from anywhere. This is the default.
Restrict search to pages located in the listed domains. (Substitute any list of domain substrings.)
Restrict search to North American commercial web sites.
Restrict search to North American educational web sites.
Restrict search to United Stated Government web sites.
Restrict search to United States military commercial web sites.
Restrict search to North American '.net' web sites.
Restrict search to North American organizational web sites.
"North America": Restrict search to all of the above types of web sites.
Restrict search to web sites in Africa.
Restrict search to web sites in India and Asia.
Restrict search to web sites in Central America.
Restrict search to web sites in Oceania.
Restrict search to web sites in Europe.
Restrict search to web sites in the Middle East.
Restrict search to web sites in Southeast Asia.
The default is not specifically requesting any multimedia types (presumably, this will NOT restrict the search to NON-multimedia pages).
Return pages which contain Adobe Acrobat PDF data.
Return pages which contain ActiveX.
Return pages which contain Java.
Return pages which contain JavaScript.
Return pages which contain audio.
Return pages which have one of the listed extensions. (Substitute any list of DOS-like file extensions.)
Return pages which contain ShockWave.
Return pages which contain images.
Return pages which contain VRML.
Return pages which contain VB Script.
Return pages which contain video.
The default is pages at any level on their website.
Return pages at any level on their website. This is the default.
Return pages within 3 links of "top" page of their website. (Substitute any integer in place of 3.)
Only return pages that are the "top" page of their website.
To make new back-ends, see WWW::Search.
native_setup_search is called (from WWW::Search::setup_search) before we do anything. It initializes our private variables (which all begin with underscore) and sets up a URL to the first results page in {_next_url}.
native_setup_search
WWW::Search::setup_search
{_next_url}
native_retrieve_some is called (from WWW::Search::retrieve_some) whenever more hits are needed. It calls WWW::Search::http_request to fetch the page specified by {_next_url}. It then parses this page, appending any search hits it finds to {cache}. If it finds a ''next'' button in the text, it sets {_next_url} to point to the page for the next set of results, otherwise it sets it to undef to indicate we''re done.
native_retrieve_some
WWW::Search::retrieve_some
WWW::Search::http_request
{cache}
When HotBot reports a "Mirror" URL, WWW::Search::HotBot ignores it.
Please tell the author if you find any!
This module adheres to the WWW::Search test suite mechanism.
WWW::Search
Test cases (results as of 1998-08-27): '+mrfglbqnx +NoSuchWord' --- no URLs '"Christie Abbott"' --- 9 URLs on one page 'LSAM' --- 184 URLs on two pages
As of 1998-02-02, WWW::Search::HotBot is maintained by Martin Thurn (MartinThurn@iname.com).
WWW::Search::HotBot
WWW::Search::HotBot was originally written by Wm. L. Scheding, based on WWW::Search::AltaVista.
WWW::Search::AltaVista
THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
If it''s not listed here, then it wasn''t a meaningful nor released revision.
HotBot changed their output format ever so slightly. Documentation added for all possible HotBot query options!
Better documentation for boolean queries. (Thanks to Jason Titus jason_titus@odsnet.com)
HotBot changed their output format.
\n changed to \012 for MacPerl compatibility
HotBot changed their search script location and output format on 1998-05-21. Also, as many as 6 fields of each SearchResult are now filled in.
Fixed the maximum_to_retrieve off-by-one problem. Updated test cases.
HotBot does not do truncation. Therefore, if the query contains truncation characters (i.e. '*' at end of words), they are simply deleted before the query is sent to HotBot.
Fixed and revamped by Martin Thurn.
To install WWW::Search, copy and paste the appropriate command in to your terminal.
cpanm
cpanm WWW::Search
CPAN shell
perl -MCPAN -e shell install WWW::Search
For more information on module installation, please visit the detailed CPAN module installation guide.