Search results for "module:WWW::Robot"
WWW::Robot - configurable web traversal engine (for web robots & agents)
This module implements a configurable web traversal engine, for a *robot* or other web agent. Given an initial web page (*URL*), the Robot will get the contents of that page, and extract all links on the page, adding them to a list of URLs to visit. ...
KVENTIN/WWW-Robot-0.026 - 07 Aug 2009 13:21:26 UTC
WWW::RobotRules - database of robots.txt-derived permissions
This module parses /robots.txt files as specified in "A Standard for Robot Exclusion", at <http://www.robotstxt.org/wc/norobots.html> Webmasters can use the /robots.txt file to forbid conforming robots from accessing parts of their web site. The pars...
GAAS/WWW-RobotRules-6.02 - 18 Feb 2012 13:09:13 UTC
WWW::SimpleRobot - a simple web robot for recursively following links on web pages.
A simple perl module for doing robot stuff. For a more elaborate interface, see WWW::Robot. This version uses LWP::Simple to grab pages, and HTML::LinkExtor to extract the links from them. Only href attributes of anchor tags are extracted. Extracted ...
AWRIGLEY/WWW-SimpleRobot-0.07 - 28 Jun 2001 14:50:08 UTC
WWW::RobotRules::DBIC - Persistent RobotRules which use DBIC.
WWW::RobotRules::DBIC is a subclass of WWW::RobotRules, which use DBIx::Class to store robots.txt info to any RDBMS....
IKEBE/WWW-RobotRules-DBIC-0.01 - 18 Oct 2006 13:58:41 UTC
WWW::RobotRules::Extended - database of robots.txt-derived permissions. This is a fork of WWW::RobotRules
This module parses /robots.txt files as specified in "A Standard for Robot Exclusion", at <http://www.robotstxt.org/wc/norobots.html> It also parses rules that contains wildcards '*' and allow directives like Google does. Webmasters can use the /robo...
YSIMONX/WWW-RobotRules-Extended-0.02 - 14 Jan 2012 10:23:47 UTC
WWW::RobotRules::Parser - Just Parse robots.txt
WWW::RobotRules::Parser allows you to simply parse robots.txt files as described in http://www.robotstxt.org/wc/norobots.html. Unlike WWW::RobotRules (which is very cool), this module does not take into consideration your user agent name when parsing...
DMAKI/WWW-RobotRules-Parser-0.04001 - 01 Dec 2007 13:33:54 UTC
WWW::RobotRules::Memcache - Use memcached in conjunction with WWW::RobotRules
This is a subclass of WWW::RobotRules that uses Cache::Memcache to implement persistent caching of robots.txt and host visit information....
SOCK/WWW-RobotRules-Memcache-0.1 - 08 Sep 2006 02:02:39 UTC
WWW::Mixi - Perl extension for scraping the MIXI social networking service.
WWW::Mixi uses LWP::RobotUA to scrape mixi.jp. This provide login method, get and put method, and some parsing method for user who create mixi spider. I think using WWW::Mixi is better than using LWP::UserAgent or LWP::Simple for accessing Mixi. WWW:...
TSUKAMOTO/WWW-Mixi-0.50 - 01 Aug 2007 06:02:56 UTC
WWW::RobotRules::AnyDBM_File - Persistent RobotRules
This is a subclass of *WWW::RobotRules* that uses the AnyDBM_File package to implement persistent diskcaching of robots.txt and host visit information. The constructor (the new() method) takes an extra argument specifying the name of the DBM file to ...
GAAS/WWW-RobotRules-6.02 - 18 Feb 2012 13:09:13 UTC
lib/WWW/RobotRules/DBIC/Schema/DateTime.pm
IKEBE/WWW-RobotRules-DBIC-0.01
-
18 Oct 2006 13:58:41 UTC
lib/WWW/RobotRules/DBIC/Schema/UserAgent.pm
IKEBE/WWW-RobotRules-DBIC-0.01
-
18 Oct 2006 13:58:41 UTC
WWW::OhNoRobotCom::Search - search comic transcriptions on http://ohnorobot.com
The module provides interface to perform searches on <http://www.ohnorobot.com> comic transcriptions website....
ZOFFIX/WWW-OhNoRobotCom-Search-0.003 - 18 Dec 2013 22:59:56 UTC
WWW::RobotRules::Parser::MultiValue - Parse robots.txt
"WWW::RobotRules::Parser::MultiValue" is a parser for "robots.txt". Parsed rules for the specified user agent is stored as a Hash::MultiValue, where the key is a lower case rule name. "Request-rate" rule is handled specially. It is normalized to "Cra...
TARAO/WWW-RobotRules-Parser-MultiValue-0.02 - 12 Mar 2015 05:46:38 UTC
WWW::Scraper - framework for scraping results from search engines.
NOTE: You can find a full description of the Scraper framework in WWW::Scraper::ScraperPOD.pm. "Scraper" is a framework for issuing queries to a search engine, and scraping the data from the resultant multi-page responses, and the associated detail p...
GLENNWOOD/Scraper-3.05 - 02 Aug 2003 07:47:12 UTC
POE::Component::WWW::OhNoRobotCom::Search - non-blocking POE based wrapper around WWW::OhNoRobotCom::Search module
The module is a non-blocking wrapper around WWW::OhNoRobotCom::Search which provides interface to <http://www.ohnorobot.com/> search...
ZOFFIX/POE-Component-WWW-OhNoRobotCom-Search-0.002 - 17 Dec 2013 00:50:53 UTC
WWW::Link - maintain information about the state of links
WWW::Link is a perl class which accepts and maintains information about links. For example, this would include urls which are referenced from a WWW page. The link class will be acted on by such programs as link checkers to give it information and by ...
MIKEDLR/WWW-Link-0.036 - 09 Feb 2002 18:10:43 UTC
WWW::Search - Virtual base class for WWW searches
This class is the parent for all access methods supported by the "WWW::Search" library. This library implements a Perl API to web-based search engines. See README for a list of search engines currently supported, and for a lot of interesting high-lev...
MTHURN/WWW-Search-2.519 - 03 Apr 2020 15:23:14 UTC