WWW::Robot - configurable web traversal engine (for web robots & agents)

This module implements a configurable web traversal engine, for a *robot* or other web agent. Given an initial web page (*URL*), the Robot will get the contents of that page, and extract all links on the page, adding them to a list of URLs to visit. ...

KVENTIN/WWW-Robot-0.026 - 07 Aug 2009 13:21:26 GMT - Search in distribution

WWW::RobotRules - database of robots.txt-derived permissions

This module parses /robots.txt files as specified in "A Standard for Robot Exclusion", at <http://www.robotstxt.org/wc/norobots.html> Webmasters can use the /robots.txt file to forbid conforming robots from accessing parts of their web site. The pars...

GAAS/WWW-RobotRules-6.02   (1 review) - 18 Feb 2012 13:09:13 GMT - Search in distribution

WWW::SimpleRobot - a simple web robot for recursively following links on web pages.

A simple perl module for doing robot stuff. For a more elaborate interface, see WWW::Robot. This version uses LWP::Simple to grab pages, and HTML::LinkExtor to extract the links from them. Only href attributes of anchor tags are extracted. Extracted ...

AWRIGLEY/WWW-SimpleRobot-0.07 - 28 Jun 2001 14:50:08 GMT - Search in distribution

WWW::RobotRules::DBIC - Persistent RobotRules which use DBIC.

WWW::RobotRules::DBIC is a subclass of WWW::RobotRules, which use DBIx::Class to store robots.txt info to any RDBMS....

IKEBE/WWW-RobotRules-DBIC-0.01 - 18 Oct 2006 13:58:41 GMT - Search in distribution

WWW::RobotRules::Extended - database of robots.txt-derived permissions. This is a fork of WWW::RobotRules

This module parses /robots.txt files as specified in "A Standard for Robot Exclusion", at <http://www.robotstxt.org/wc/norobots.html> It also parses rules that contains wildcards '*' and allow directives like Google does. Webmasters can use the /robo...

YSIMONX/WWW-RobotRules-Extended-0.02 - 14 Jan 2012 10:23:47 GMT - Search in distribution

WWW::RobotRules::Parser - Just Parse robots.txt

WWW::RobotRules::Parser allows you to simply parse robots.txt files as described in http://www.robotstxt.org/wc/norobots.html. Unlike WWW::RobotRules (which is very cool), this module does not take into consideration your user agent name when parsing...

DMAKI/WWW-RobotRules-Parser-0.04001 - 01 Dec 2007 13:33:54 GMT - Search in distribution

WWW::RobotRules::Memcache - Use memcached in conjunction with WWW::RobotRules

This is a subclass of WWW::RobotRules that uses Cache::Memcache to implement persistent caching of robots.txt and host visit information....

SOCK/WWW-RobotRules-Memcache-0.1 - 08 Sep 2006 02:02:39 GMT - Search in distribution

WWW::Mixi - Perl extension for scraping the MIXI social networking service.

WWW::Mixi uses LWP::RobotUA to scrape mixi.jp. This provide login method, get and put method, and some parsing method for user who create mixi spider. I think using WWW::Mixi is better than using LWP::UserAgent or LWP::Simple for accessing Mixi. WWW:...

TSUKAMOTO/WWW-Mixi-0.50   (1 review) - 01 Aug 2007 06:02:56 GMT - Search in distribution
  • WWW::Mixi - Mixiアクセス用のLWP::UserAgentモジュール

WWW::OhNoRobotCom::Search - search comic transcriptions on http://ohnorobot.com

The module provides interface to perform searches on <http://www.ohnorobot.com> comic transcriptions website....

ZOFFIX/WWW-OhNoRobotCom-Search-0.003 - 18 Dec 2013 22:59:56 GMT - Search in distribution

WWW::RobotRules::Parser::MultiValue - Parse robots.txt

"WWW::RobotRules::Parser::MultiValue" is a parser for "robots.txt". Parsed rules for the specified user agent is stored as a Hash::MultiValue, where the key is a lower case rule name. "Request-rate" rule is handled specially. It is normalized to "Cra...

TARAO/WWW-RobotRules-Parser-MultiValue-0.02 - 12 Mar 2015 05:46:38 GMT - Search in distribution

WWW::Scraper - framework for scraping results from search engines.

NOTE: You can find a full description of the Scraper framework in WWW::Scraper::ScraperPOD.pm. "Scraper" is a framework for issuing queries to a search engine, and scraping the data from the resultant multi-page responses, and the associated detail p...

GLENNWOOD/Scraper-3.05 - 02 Aug 2003 07:47:12 GMT - Search in distribution

POE::Component::WWW::OhNoRobotCom::Search - non-blocking POE based wrapper around WWW::OhNoRobotCom::Search module

The module is a non-blocking wrapper around WWW::OhNoRobotCom::Search which provides interface to <http://www.ohnorobot.com/> search...

ZOFFIX/POE-Component-WWW-OhNoRobotCom-Search-0.002 - 17 Dec 2013 00:50:53 GMT - Search in distribution

LWP - The World-Wide Web library for Perl

The libwww-perl collection is a set of Perl modules which provides a simple and consistent application programming interface (API) to the World-Wide Web. The main focus of the library is to provide classes and functions that allow you to write WWW cl...

ETHER/libwww-perl-6.15   (8 reviews) - 05 Dec 2015 06:01:09 GMT - Search in distribution

w3mir - all purpose HTTP-copying and mirroring tool

You may specify many options and one HTTP-URL on the w3mir command line. A single HTTP URL *must* be specified either on the command line or in a URL directive in a configuration file. If the URL refers to a directory it *must* end with a "/", otherw...

JANL/w3mir-1.0.10 - 04 Feb 2001 21:27:19 GMT - Search in distribution

Net::Rovio - A Perl module for Rovio manipulation

Use Net::Rovio to control your Rovio robot from Perl. Uses basic Rovio API commands. The Rovio <http://www.wowwee.com/en/products/tech/telepresence/rovio/rovio> is a Wi-Fi enabled mobile webcam that lets you view and interact with its environment thr...

TYRODEN/Net-Rovio-1.5 - 13 May 2010 03:33:32 GMT - Search in distribution

robots
MIKEDLR/Link_Controller-0.037 - 09 Feb 2002 18:12:34 GMT - Search in distribution

WWW::Link - maintain information about the state of links

WWW::Link is a perl class which accepts and maintains information about links. For example, this would include urls which are referenced from a WWW page. The link class will be acted on by such programs as link checkers to give it information and by ...

MIKEDLR/WWW-Link-0.036 - 09 Feb 2002 18:10:43 GMT - Search in distribution

CGI::Info - Information about the CGI environment

NHORNE/CGI-Info-0.61 - 02 Dec 2016 02:49:09 GMT - Search in distribution

OAuth::Consumer - LWP based user agent with OAuth for consumer application

As OAuth::Consumer is a high-level library, this documentation does not describe precisely the OAuth protocol. You may find documentation on this protocol on these websites: <http://markdown.io/https://raw.github.com/Dynalon/Rainy/master/docs/OAU TH....

MATHIAS/OAuth-Consumer-0.03 - 12 Mar 2013 22:43:45 GMT - Search in distribution

sitemapper.pl - script for generating site maps

sitemapper.pl generates site maps for a given site. It traverses a site from the root URL given as the -site option and generates an HTML page consisting of a bulleted list which reflects the structure of the site. The structure reflects the distance...

AWRIGLEY/sitemapper-1.019 - 09 Jun 2000 15:23:35 GMT - Search in distribution
  • WWW::Sitemap - functions for generating a site map for a given site URL.