The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

URI::ParseSearchString::More - Extract search strings from more referrers.

VERSION

Version 0.10

SYNOPSIS

  use URI::ParseSearchString::More;
  my $more = URI::ParseSearchString::More;
  my $search_terms = $more->se_term( $search_engine_referring_url );

DESCRIPTION

This module is a subclass of URI::ParseSearchString, so you can call any methods on this object that you would call on a URI::ParseSearchString object. This module works a little harder than its SuperClass to get you results. If it fails, it will return to you the results that URI::ParseSearchString would have returned to you anyway, so it should function well as a drop-in replacement.

WWW::Mechanize is used to extract search strings from some URLs which contain session info rather than search params. Optionally, WWW::Mechanize::Cached can be used to cache your lookups. There is additional parsing and also a guess() method which will return good results in many cases of doubt.

USAGE

  use URI::ParseSearchString::More;
  my $more = URI::ParseSearchString::More;
  my $search_terms = $more->se_term( $url );

URI::ParseSearchString

parse_search_string( $url )

At this point, this is the only "extended" URI::ParseSearchString method. This method performs the following bit of logic:

1) If the URL supplied looks to be a search query with session info rather than search data in the URL, it will attempt to access the URL and extract the search terms from the page returned.

2) If this returns no results, the URL will be processed by parse_more()

3) If there are still no results, the results of URI::ParseSearchString::se_term will be returned.

WWW::Mechanize::Cached can be used to speed up your movement through large log files which may contain multiple similar URLs:

  use URI::ParseSearchString::More;
  my $more = URI::ParseSearchString::More;
  $more->set_cached( 1 );
  my $search_terms = $more->se_term( $url );

One interesting thing to note is that maps.google.* URLs have 2 important params: "q" and "near". The same can be said for local.google.* I would think the results would be incomplete without including the value of "near" in the search terms for these searches. So, expect the following results:

  my $url = ""http://local.google.ca/local?sc=1&hl=en&near=Stratford%20ON&btnG=Google%20Search&q=home%20health";
  my $terms = $more->parse_search_string( $url );

  # $terms will = "home health Stratford ON"

Engines with session info currently supported:

  aol.com
  http://as.starware.com/dp/search
  http://as.weatherstudio.com/dp/search

se_term( $url )

A convenience method which calls parse_search_string.

URI::ParseSearchString::More

blame

Returns the name of the module that came up with the results on the last string parsed by parse_search_string(). Possible results:

  URI::ParseSearchString
  URI::ParseSearchString::More
  

set_cached( 0|1 )

Turn caching off and on. As of version 0.08 caching is OFF by default. See KNOWN ISSUES below for more info on this.

get_cached

Returns 1 if caching is currently on, 0 if it is not.

get_mech

This gives you direct access to the Mechanize object. If caching is enabled, a WWW::Mechanize::Cached object will be returned. If caching is disabled, a WWW::Mechanize object will be returned.

If you know what you're doing, play around with it. Caveat emptor.

  use URI::ParseSearchString::More;
  my $more = URI::ParseSearchString::More;

  my $mech = $more->get_mech();
  $mech->agent("My Agent Name");

  my $search_terms = $more->se_term( $search_engine_referring_url );

parse_more( $url )

Handles the bulk of More's parsing. This is automatically called (if needed) when you pass a search string to se_term(). However, you may also call it directly. Just keep in mind that this method will NOT try to get results from URI::ParseSearchString if it comes up empty.

guess( $url )

For the most part, the parsing that goes on is done with specific search engines (ie. the ones that we already know about) in mind. However, in a lot cases, a good guess is all that you need. For example, a URI which contains a query string with the parameter "q" or "query" is generally the product of a search. If se_term() or parse_more() has come up empty, guess may just provide you with a valid search term. Then again, it may not. Caveat emptor.

TO DO

Here is a list of some of the engines currently not covered by URI::ParseSearchString that may be added to this module:

  images.google.*
  www.adelphia.net/google/
  http://answers.yahoo.com/question/index;_ylt=Al7fJtDUTm2S69bM0VvjPDIjzKIX?qid=20061214165004AADtB1I

NOTES

Despite its low version number, this module actually works. It is, however, still very young and the interface is subject to some change.

KNOWN ISSUES

On some systems, this module dies with the following message when caching is enabled:

Can't store CODE items at blib/lib/Storable.pm (autosplit into blib/lib/auto/Storable/_freeze.al) line 339

For this reason, caching is disabled by default as of version 0.08 If caching does not fail on your system, I encourage you to enable it. It seems to me that this error is not caused by any problem with this module, but I haven't really spent too much time looking into it as I can't replicate it on my development machine. Leaving it enabled by default would cause a lot of failing tests and switching it off only for tests would mean a lot of passing tests but failing real world use.

See the documentation it t/005_parse_more.t for information on how to run the parsing tests with caching enabled.

BUGS

Please use the RT interface to report bugs:

http://rt.cpan.org/NoAuth/Bugs.html?Dist=URI-ParseSearchString-More

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc URI::ParseSearchString::More

You can also look for information at:

AUTHOR

    Olaf Alders
    CPAN ID: OALDERS
    WunderCounter.com
    olaf@wundersolutions.com
    http://www.wundercounter.com

COPYRIGHT

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.