The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Yahoo::Search - Perl interface to Yahoo! Search's public API.

The following search spaces are supported:

Doc

Common web search for documents (html, pdf, doc, ...)

Image

Image search (jpeg, png, gif, ...)

Video

Video file search (avi, mpeg, realmedia, ...)

News

News article search

Local

Yahoo! Local area (ZIP-code-based Yellow-Page like search)

Spell

A pseudo-search to fetch a "did you mean?" spelling suggestion for a search term.

A pseudo-search to fetch "also try" related-searches for a search term.

(Note: what this Perl API calls "Doc" Search is what Yahoo! calls "Web" Search. But gee, aren't all web searches "Web" search, including Image/News/Video/etc?)

Yahoo!'s raw API, which this package uses, is described at:

  http://developer.yahoo.net/

DOCS

The full documentation for this suite of classes is spread among these packages:

   Yahoo::Search
   Yahoo::Search::Request
   Yahoo::Search::Response
   Yahoo::Search::Result

However, you need use only Yahoo::Search, which brings in the others as needed.

SYNOPSIS

Yahoo::Search provides a rich and full-featured set of classes for accessing the various features of Yahoo! Search, and also offers a variety of has shortcuts to allow simple access, such as the following Doc search:

 use Yahoo::Search;
 my @Results = Yahoo::Search->Results(Doc => "Britney latest marriage",
                                      AppId => "YahooDemo",
                                      # The following args are optional.
                                      # (Values shown are package defaults).
                                      Mode         => 'all',
                                      Count        => 10,
                                      Start        => 0,
                                      Type         => 'all',
                                      AllowAdult   => 0,
                                      AllowSimilar => 0,
                                      Language     => undef,
                                     );
 warn $@ if $@; # report any errors

 for my $Result (@Results)
 {
     printf "Result: #%d\n",  $Result->I + 1,
     printf "Url:%s\n",       $Result->Url;
     printf "%s\n",           $Result->ClickUrl;
     printf "Summary: %s\n",  $Result->Summary;
     printf "Title: %s\n",    $Result->Title;
     printf "In Cache: %s\n", $Result->CacheUrl;
     print "\n";
 }

The first argument to Results indicates which search space is to be queried (in this case, Doc). The second argument is the search term or phrase (described in detail in the next section). Subsequent arguments are optional key/value pairs (described in detail in the section after that) -- the ones shown in the example are those allowed for a Doc query, with the values shown being the defaults.

Results returns a list of Yahoo::Search::Result objects, one per item (in the case of a Doc search, an item is a web page, pdf document, doc document, etc.). The methods available to a Result object are dependent upon the search space of the original query -- see Yahoo::Search::Result documentation for the complete list.

Search term / phrase

Within a search phrase ("Britney latest marriage" in the example above), words that you wish to be included even if they would otherwise be eliminated as "too common" should be proceeded with a "+". Words that you wish to exclude should be proceeded with a "-". Words can be separated with "OR" (the default for the any Mode, described below), and can be wrapped in double quotes to identify an exact phrase (the default with the phrase Mode, also described below).

There are also a number of "Search Meta Words", as described at http://help.yahoo.com/help/us/ysearch/basics/basics-04.html and http://help.yahoo.com/help/us/ysearch/tips/tips-03.html , which can stand along or be combined with Doc searches (and, to some extent, some of the others -- YMMV):

site:

allows one to find all documents within a particular domain and all its subdomains. Example: site:yahoo.com

hostname:

allows one to find all documents from a particular host only. Example: hostname:autos.yahoo.comm

link:

allows one to find documents that link to a particular url. Example: link:http://autos.yahoo.com/

url:

allows one to find a specific document in Yahoo!'s index. Example: url:http://edit.autos.yahoo.com/repair/tree/0.html

inurl:

allows one to find a specific keyword as part of indexed urls. Example: inurl:bulgarian

intitle:

allows one to find a specific keyword as part of the indexed titles. Example: intitle:Bulgarian

As an example combining a number of different search styles, consider

    my @Results = Yahoo::Search->Results(Doc => 'site:TheSmokingGun.com "Michael Jackson" -arrest',
                                         AppId => "YahooDemo");

This returns data about pages at TheSmokingGun.com about Michael Jackson that don't contain the word "arrest" (yes, there are actually a few such pages).

Query arguments

As mentioned above, the arguments allowed in a Query call depend upon the search space of the query. Here is a table of the possible arguments, showing which apply to queries of which search space:

                  Doc   Image  Video  News   Local  Spell  Related
                 -----  -----  -----  -----  -----  -----  -------
  AppId           [X]    [X]    [X]    [X]    [X]    [X]     [X]
  Mode            [X]    [X]    [X]    [X]    [X]     .       .
  Start           [X]    [X]    [X]    [X]    [X]     .       .
  Count           [X]    [X]    [X]    [X]    [X]     .       .

  AllowSimilar    [X]     .      .      .      .      .       .
  AllowAdult      [X]    [X]    [X]     .      .      .       .
  Type            [X]    [X]    [X]     .      .      .       .
  Sort             .      .      .     [X]     .      .       .
  Language        [X]     .      .     [X]     .      .       .

  Street           .      .      .      .     [X]     .       .
  City             .      .      .      .     [X]     .       .
  State            .      .      .      .     [X]     .       .
  PostalCode       .      .      .      .     [X]     .       .
  Location         .      .      .      .     [X]     .       .
  Radius           .      .      .      .     [X]     .       .

  AutoContinue    [X]    [X]    [X]    [X]    [X]    [X]     [X]
  Debug           [X]    [X]    [X]    [X]    [X]    [X]     [X]
  PreRequestCallback [X] [X]    [X]    [X]    [X]    [X]     [X]

Here are details of each:

AppId

A 8-40 character string which identifies the application making use of the Yahoo! Search API. (Think of it along the lines of an HTTP User-Agent string.)

The characters allowed are space, plus A-Za-z0-9_()[]*+-=,.:@\

This argument is required of all searches (sorry). You can make up whatever AppId you'd like, but you are encouraged to register it via the link on

  http://developer.yahoo.net/

especially if you are creating something that will be widly distributed.

As mentioned below in Defaults and Default Overrides, it's particularly convenient to get the AppId out of the way by putting it on the use line, e.g.

   use Yahoo::Search AppId => 'just testing';

It then applies to all queries unless explicitly overridden.

Mode

Must be one of: all (the default), any, or phrase. Indicates how multiple words in the search term are used: search for documents with all words, documents with any words, or documents that contain the search term as an exact phrase.

Start

Indicates the ordinal of the first result to be returned, e.g. the "30" of "showing results 30-40" (except that Start is zero-based, not one-based). The default is zero, meaning that the primary results will be returned.

Count

Indicates how many items should be returned. The default is 10. The maximum allowed depends on the search space being queried: 20 for Local searches, and 50 for others which support the Count argument.

Note that

  Yahoo::Search::MaxCount($SearchSpace)

and

  $SearchEngine->MasCount($SearchSpace)

return the maximum count allowed for the given $SearchSpace.

AllowSimilar

If this boolean is true (the default is false), similar results which would otherwise not be returned are included in the result set.

AllowAdult

If this boolean is false (the default), results considered to be "adult" (i.e. porn) are not included in the result set. Set to true to allow unfiltered results.

Standard precautions apply about how the "is adult?" determination is not perfect.

Type

This argument can be used to restrict the results to only a specific file type. The default value, all, allows any type (associated with the search space) to be returned. Otherwise, the values allowed depend on the search space:

 Search space    Allowed Type values
 ============    ========================================================
 Doc             all  html msword pdf ppt rss txt xls
 Img             all  bmp gif jpeg png
 Video           all  avi flash mpeg msmedia quicktime realmedia
 News            N/A
 Local           N/A
 Spell           N/A
 Related         N/A
Sort

For News searches, the sort may be rank (the default) or date.

Language

If provided, restricts the results to documents in the given language. The value is an language code such as en (English), ja (Japanese), etc (mostly ISO 639-1 codes). These are the codes supported:

 code  language
 ----  ---------
  sq   Albanian
  ar   Arabic
  bg   Bulgarian
  ca   Catalan
  szh  Chinese (simplified)
  tzh  Chinese (traditional)
  hr   Croatian
  cs   Czech
  da   Danish
  nl   Dutch
  en   English
  et   Estonian
  fi   Finnish
  fr   French
  de   German
  el   Greek
  he   Hebrew
  hu   Hungarian
  is   Icelandic
  it   Italian
  ja   Japanese
  ko   Korean
  lv   Latvian
  lt   Lithuanian
  no   Norwegian
  fa   Persian
  pl   Polish
  pt   Portuguese
  ro   Romanian
  ru   Russian
  sk   Slovak
  sl   Slovenian
  es   Spanish
  sv   Swedish
  th   Thai
  tr   Turkish

In addition, the code "default" is the same as the lack of a language specifier, and seems to mean a mix of major world languages, skewed toward English.

Street
City
State
PostalCode
Location

These items are for a Local query, and specify the epicenter of the search. The epicenter must be provided in one of a variety of ways: via the free-text Location, via Street + PostalCode, via Street + City + State, via PostalCode alone, or via City + State alone.

Street is the street address, e.e. "701 First Ave". PostalCode is a US 5-digit or 9-digit ZIP code (e.g. "94089" or "94089-1234").

If Location is provided, it supersedes the others. It should be a string along the lines of "701 First Ave, Sunnyvale CA, 94089". The following forms are recognized:

  city state
  city state zip
  zip
  street, city state
  street, city state zip
  street, zip

Searches that include a street address (either in the Location, or if Location is empty, in Street) provide for a more detailed epicenter specification.

Radius

For Local searches, indicates how wide an area around the epicenter to search. The value is the radius of the search area, in miles. The default radius depends on the search location (urban areas tend to have a smaller default radius).

Debug

Debug is a string (defaults to an empty string). If the substring "url" is found anywhere in the string, the url of the Yahoo! request is printed on stderr. If "xml", the raw xml received is printed to stderr. If "hash", the raw Perl hash, as converted from the XML, is Data::Dump'd to stderr.

Thus, to print all debugging, you'd set Debug to a value such as "url xml hash".

AutoContinue

A boolean (default off). If true, turns on the potentially dangerous auto-continuation, as described in the docs for NextResult in Yahoo::Search::Response.

Class Hierarchy Details

The Y! Search API class system supports the following objects (all loaded as needed via Yahoo::Search):

  Yahoo::Search
  Yahoo::Search::Request
  Yahoo::Search::Response
  Yahoo::Search::Result

Here is a summary of them:

Yahoo::Search

A "search engine" object which can hold user-specified default values for search-query arguments. Often not used explicitly.

Yahoo::Search::Request

An object which holds the information needed to make one search-query request. Often not used explicitly.

Yahoo::Search::Response

An object which holds the results of a query (including a bunch of Result objects).

Yahoo::Search::Result

An object representing one query result (one image, web page, etc., as appropriate to the original search space).

"The Long Way", and Common Practice

The explicit way to perform a query and access the results is to first create a "Search Engine" object:

  my $SearchEngine = Yahoo::Search->new();

Optionally, you can provide new with key/value pairs as described in the Query arguments section above. Those values will then be available as default values during subsequent request creation. (More on this later.)

You then use the search-engine object to create a request:

  my $Request = $SearchEngine->Request(Doc => Britney);

You then actually make the request, getting a response:

  my $Response = $Request->Fetch();

You can then access the set of Result objects in a number of ways, either all at once

  my @Results = $Response->Results();

or iteratively:

  while (my $Result = $Response->NextResult) {
               :
               :
  }

In Practice....

In practice, one often does not need to go through all these steps explicitly. The only reason to create a search-engine object, for example, is to hold default overrides (to be made available to subsequent requests made via the search-engine object). For example:

   use Yahoo::Search;
   my $SearchEngine = Yahoo::Search->new(AppId      => "Bobs Fish Mart",
                                         Count      => 25,
                                         AllowAdult => 1,
                                         PostalCode => 95014);

Now, calls to the various query functions (Query, Results) via this $SearchEngine will use these defaults (Image searches, for example, will be with AllowAdult set to true, and Local searches will be centered at ZIP code 95014.) All will return up to 25 results.

In this example:

   my @Results = $SearchEngine->Results(Image => "Britney",
                                        Count => 20);

The query is made with AppId as 'Bobs_Fish_Mart' and AllowAdult true (both via $SearchEngine), but Count is 20 because explicit args override the default in $SearchEngine. The PostalCode arg does not apply too an Image search, so the default provided from SearchEngine is not needed with this particular query.

Defaults on the 'use' line

You can also provide the same defaults on the use line. The following example has the same result as the previous one:

   use Yahoo::Search AppId      => 'Bobs Fish Mart',
                     Count      => 25,
                     AllowAdult => 1,
                     PostalCode => 95014;

   my @Results = Yahoo::Search->Results(Image => "Britney",
                                        Count => 20);

Functions and Methods

Here, finally, are the functions and methods provided by Yahoo::Search. In all cases, "...args..." are any of the key/value pairs listed in the Query arguments section of this document (e.g. "Count => 20")

$SearchEngine = Yahoo::Search->new(...args...)

Creates a search-engine object (a container for defaults). On error, sets $@ and returns nothing.

$Request = $SearchEngine->Request($space => $query, ...args...)
$Request = Yahoo::Search->Request($space => $query, ...args...)

Creates a Request object representing a search of the named search space (Doc, Image, etc.) of the given query string.

On error, sets $@ and returns nothing.

Note: all arguments are in key/value pairs, but the $space/$query pair (which is required) is required to appear first.

$Response = $SearchEngine->Query($space => $query, ...args...)
$Response = Yahoo::Search->Query($space => $query, ...args...)

Creates an implicit Request object, and fetches it, returning the resulting Response.

On error, sets $@ and returns nothing.

Note: all arguments are in key/value pairs, but the $space/$query pair (which is required) is required to appear first.

@Results = $SearchEngine->Results($space => $query, ...args...)
@Results = Yahoo::Search->Results($space => $query, ...args...)

Creates an implicit Request object, then Response object, in the end returning a list of Result objects.

On error, sets $@ and returns nothing.

Note: all arguments are in key/value pairs, but the $space/$query pair (which is required) is required to appear first.

A super shortcut which goes directly from the query args to a list of

  <a href=...>...</a>

links. Essentially,

    map { $_->Link } Yahoo::Search->Results($space => $query, ...args...);

or, more explicitly:

    map { $_->Link } Yahoo::Search->new()->Request($space => $query, ...args...)->Fetch->Results(@_);

See Link in the documentation for Yahoo::Search::Result.

Note: all arguments are in key/value pairs, but the $space/$query pair (which is required) is required to appear first.

A super shortcut for Spell and Related search spaces, returns the list of spelling-or related-search suggestions, respectively.

Note: all arguments are in key/value pairs, but the $space/$query pair (which is required) is required to appear first.

@html = $SearchEngine->HtmlResults($space => $query, ...args...)
@html = Yahoo::Search->HtmlResults($space => $query, ...args...)

Like Links, but returns a list of html strings (one representing each result). See as_html in the documentation for Yahoo::Search::Result.

A simple result display might look like

   print join "<p>", Yahoo::Search->HtmlResults(....);

or, perhaps

   if (my @HTML = Yahoo::Search->HtmlResults(....))
   {
      print "<ul>";
      for my $html (@HTML) {
         print "<li>", $html;
      }
      print "</ul>";
   }

As an example, here's a complete CGI which shows results from an image-search, where the search term is in the 's' query string:

   #!/usr/local/bin/perl -w
   use CGI;
   my $cgi = new CGI;
   print $cgi->header();

   use Yahoo::Search AppId => 'my-search-app';
   if (my $term = $cgi->param('s')) {
       print join "<p>", Yahoo::Search->HtmlResults(Img => $term);
   }

The results, however, do look better with some style-sheet attention, such as:

  <style>
    .yResult { display: block; border: #CCF 3px solid ; padding:10px }
    .yLink   { }
    .yTitle  { display:none }
    .yImg    { border: solid 1px }
    .yUrl    { display:none }
    .yMeta   { font-size: 80% }
    .ySrcUrl { }
    .ySum    { font-family: arial; font-size: 90% }
  </style>

Note: all arguments are in key/value pairs, but the $space/$query pair (which is required) is required to appear first.

@html = $SearchEngine->MaxCount($space)
@html = Yahoo::Search->MaxCount($space)

Returns the maximum allowed Count query-argument for the given search space.

$SearchEngine->Default($key [ => $val ]);

If a new value is given, update the <$SearchEngine>'s value for the named $key.

In either case, the old value for $key in effect is returned. If the $SearchEngine had a previous value, it is returned. Otherwise, the global value in effect is returned.

As always, the key is from among those mentioned in the Query arguments section above.

The old value is returned.

Yahoo::Search->Default($key [ => $val ]);

Update or, if no new value is given, check the global default value for the named argument. The key is from among those mentioned in the Query examples section above, as well as AutoCarp (discussed below).

Defaults and Default Overrides

All key/value pairs mentioned in the Query arguments section may appear on the use line, in the call to the new constructor, or in requests that create a query explicitly or implicitly (Request, Query, Results, Links, or HtmlResults).

Each argument's value takes the first of the following which applies (listed in order of precedence):

4)

The actual arguments to a function which creates (explicitly or implicitly) a request.

3)

Search-engine default overrides, set when the Yahoo::Search new constructor is used to create a search-engine object, or when that object's Default method is called.

2)

Global default overrides, set on the use line or via

 Yahoo::Search->Default()
1)

Defaults hard-coded into these packages (e.g. Count defaults to 10).

It's particularly convenient to put the AppId on the use line, e.g.

   use Yahoo::Search AppId => 'just testing';

AutoCarp

By default, detected errors that would be classified as programming errors (e.g. use of incorrect args) are automatically spit out to stderr besides being returned via $@. This can be turned off via

  use Yahoo::Search AutoCarp => 0;

or

 Yahoo::Search->Default(AutoCarp => 0);

The default of true is somewhat obnoxious, but hopefully helps create better programs by forcing the programmer to actively think about error checking (if even long enough to turn off error reporting).

Copyright

Copyright (C) 2005 Yahoo! Inc.

Author

Jeffrey Friedl (jfriedl@yahoo.com)

$Id: Search.pm 2 2005-01-28 04:27:46Z jfriedl $