WWW::Robot - configurable web traversal engine (for web robots & agents)
This module implements a configurable web traversal engine, for a *robot* or other web agent. Given an initial web page (*URL*), the Robot will get the contents of that page, and extract all links on the page, adding them to a list of URLs to visit. ...KVENTIN/WWW-Robot-0.026 - 07 Aug 2009 13:21:26 UTC
WWW::RobotRules - database of robots.txt-derived permissions
This module parses /robots.txt files as specified in "A Standard for Robot Exclusion", at <http://www.robotstxt.org/wc/norobots.html> Webmasters can use the /robots.txt file to forbid conforming robots from accessing parts of their web site. The pars...GAAS/WWW-RobotRules-6.02 - 18 Feb 2012 13:09:13 UTC
WWW::SimpleRobot - a simple web robot for recursively following links on web pages.
A simple perl module for doing robot stuff. For a more elaborate interface, see WWW::Robot. This version uses LWP::Simple to grab pages, and HTML::LinkExtor to extract the links from them. Only href attributes of anchor tags are extracted. Extracted ...AWRIGLEY/WWW-SimpleRobot-0.07 - 28 Jun 2001 14:50:08 UTC
WWW::RobotRules::DBIC - Persistent RobotRules which use DBIC.
WWW::RobotRules::DBIC is a subclass of WWW::RobotRules, which use DBIx::Class to store robots.txt info to any RDBMS....IKEBE/WWW-RobotRules-DBIC-0.01 - 18 Oct 2006 13:58:41 UTC
WWW::RobotRules::Extended - database of robots.txt-derived permissions. This is a fork of WWW::RobotRules
This module parses /robots.txt files as specified in "A Standard for Robot Exclusion", at <http://www.robotstxt.org/wc/norobots.html> It also parses rules that contains wildcards '*' and allow directives like Google does. Webmasters can use the /robo...YSIMONX/WWW-RobotRules-Extended-0.02 - 14 Jan 2012 10:23:47 UTC
WWW::RobotRules::Parser - Just Parse robots.txt
WWW::RobotRules::Parser allows you to simply parse robots.txt files as described in http://www.robotstxt.org/wc/norobots.html. Unlike WWW::RobotRules (which is very cool), this module does not take into consideration your user agent name when parsing...DMAKI/WWW-RobotRules-Parser-0.04001 - 01 Dec 2007 13:33:54 UTC
WWW::RobotRules::Memcache - Use memcached in conjunction with WWW::RobotRules
This is a subclass of WWW::RobotRules that uses Cache::Memcache to implement persistent caching of robots.txt and host visit information....SOCK/WWW-RobotRules-Memcache-0.1 - 08 Sep 2006 02:02:39 UTC
WWW::Mixi - Perl extension for scraping the MIXI social networking service.
WWW::Mixi uses LWP::RobotUA to scrape mixi.jp. This provide login method, get and put method, and some parsing method for user who create mixi spider. I think using WWW::Mixi is better than using LWP::UserAgent or LWP::Simple for accessing Mixi. WWW:...TSUKAMOTO/WWW-Mixi-0.50 - 01 Aug 2007 06:02:56 UTC
WWW::RobotRules::AnyDBM_File - Persistent RobotRules
This is a subclass of *WWW::RobotRules* that uses the AnyDBM_File package to implement persistent diskcaching of robots.txt and host visit information. The constructor (the new() method) takes an extra argument specifying the name of the DBM file to ...GAAS/WWW-RobotRules-6.02 - 18 Feb 2012 13:09:13 UTC
WWW::OhNoRobotCom::Search - search comic transcriptions on http://ohnorobot.com
The module provides interface to perform searches on <http://www.ohnorobot.com> comic transcriptions website....ZOFFIX/WWW-OhNoRobotCom-Search-0.003 - 18 Dec 2013 22:59:56 UTC
WWW::RobotRules::Parser::MultiValue - Parse robots.txt
"WWW::RobotRules::Parser::MultiValue" is a parser for "robots.txt". Parsed rules for the specified user agent is stored as a Hash::MultiValue, where the key is a lower case rule name. "Request-rate" rule is handled specially. It is normalized to "Cra...TARAO/WWW-RobotRules-Parser-MultiValue-0.02 - 12 Mar 2015 05:46:38 UTC
WWW::Scraper - framework for scraping results from search engines.
NOTE: You can find a full description of the Scraper framework in WWW::Scraper::ScraperPOD.pm. "Scraper" is a framework for issuing queries to a search engine, and scraping the data from the resultant multi-page responses, and the associated detail p...GLENNWOOD/Scraper-3.05 - 02 Aug 2003 07:47:12 UTC
POE::Component::WWW::OhNoRobotCom::Search - non-blocking POE based wrapper around WWW::OhNoRobotCom::Search module
The module is a non-blocking wrapper around WWW::OhNoRobotCom::Search which provides interface to <http://www.ohnorobot.com/> search...ZOFFIX/POE-Component-WWW-OhNoRobotCom-Search-0.002 - 17 Dec 2013 00:50:53 UTC
WWW::Link - maintain information about the state of links
WWW::Link is a perl class which accepts and maintains information about links. For example, this would include urls which are referenced from a WWW page. The link class will be acted on by such programs as link checkers to give it information and by ...MIKEDLR/WWW-Link-0.036 - 09 Feb 2002 18:10:43 UTC
WWW::Search - Virtual base class for WWW searches
This class is the parent for all access methods supported by the "WWW::Search" library. This library implements a Perl API to web-based search engines. See README for a list of search engines currently supported, and for a lot of interesting high-lev...MTHURN/WWW-Search-2.519 - 03 Apr 2020 15:23:14 UTC