NAME

URI::ParseSearchString::More - Extract search strings from more referrers.

VERSION

Version 0.02

SYNOPSIS

  use URI::ParseSearchString::More;
  my $more = URI::ParseSearchString::More;
  my $search_terms = $more->se_term( $search_engine_referring_url );

DESCRIPTION

This module is a subclass of URI::ParseSearchString, so you can call any methods on this object that you would call on a URI::ParseSearchString object. URI::ParseSearchString is extended in the following way:

WWW::Mechanize is used to extract search strings from some URLs which contain session info rather than search params. Currently this means AOL queries. Support for other engines can be added as needed.

USAGE

  use URI::ParseSearchString::More;
  my $more = URI::ParseSearchString::More;
  my $search_terms = $more->se_term( $search_engine_referring_url );

URI::ParseSearchString

se_term

At this point, this is the only "extended" URI::ParseSearchString method. If the URL supplied looks to be a search query with session info rather than search data in the URL, this method will attempt a WWW::Mechanize::Cached lookup of the URL and will try to extract the search terms from the page returned. In all other cases the results of URI::ParseSearchString::se_term will be returned.

WWW::Mechanize::Cached is used to speed up your movement through large log files which may contain multiple similar URLs.

Engines currently supported:

  http://aolsearch.aol.com/aol/search
  http://as.starware.com/dp/search
  http://as.weatherstudio.com/dp/search

URI::ParseSearchString::More

get_mech

This gives you direct access to the WWW::Mechanize::Cached object. If you know what you're doing, play around with it. Caveat emptor.

  use URI::ParseSearchString::More;
  my $more = URI::ParseSearchString::More;

  my $mech = $more->get_mech();
  $mech->agent("My Agent Name");

  my $search_terms = $more->se_term( $search_engine_referring_url );
  

TO DO

Sometimes a good guess is all you need. This module should make a (hopefully) intelligent guess when URI::ParseSearchString comes up empty and there's no session info to be had.

Here is a list of some of the engines currently not covered by URI::ParseSearchString that may be added to this module:

  about.com
  search.msn.ca (as well as other permutations of search.msn)
  books.google.*
  images.google.*
  maps.google.*
  local.google.*
  search.hk.yahoo.com
  clusty.com
  www.excite.co.uk
  search.dmoz.org
  aolsearcht2.search.aol.com
  www.att.net
  www.overture.com
  www.adelphia.net/google/
  www.googlesyndicatedsearch.com

One interesting thing to note is that maps.google.* URLs have 2 important params: "q" and "near". The same can be said for local.google.* I would think the results would be incomplete without including the value of "near" in the search terms for these searches.

NOTES

Despite its low version number, this module actually works. It is, however, still very young and the interface is subject to some change.

BUGS

Please use the RT interface to report bugs:

http://rt.cpan.org/NoAuth/Bugs.html?Dist=URI-ParseSearchString-More

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc URI::ParseSearchString::More

You can also look for information at:

AUTHOR

    Olaf Alders
    CPAN ID: OALDERS
    WunderCounter.com
    olaf@wundersolutions.com
    http://www.wundercounter.com

COPYRIGHT

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

The full text of the license can be found in the LICENSE file included with this module.