WWW::Robot - configurable web traversal engine (for web robots & agents)

This module implements a configurable web traversal engine, for a *robot* or other web agent. Given an initial web page (*URL*), the Robot will get the contents of that page, and extract all links on the page, adding them to a list of URLs to visit. ...

KVENTIN/WWW-Robot-0.026 - 07 Aug 2009 13:21:26 GMT - Search in distribution

WWW::RobotRules - database of robots.txt-derived permissions

This module parses /robots.txt files as specified in "A Standard for Robot Exclusion", at <http://www.robotstxt.org/wc/norobots.html> Webmasters can use the /robots.txt file to forbid conforming robots from accessing parts of their web site. The pars...

GAAS/WWW-RobotRules-6.02   (1 review) - 18 Feb 2012 13:09:13 GMT - Search in distribution

WWW::SimpleRobot - a simple web robot for recursively following links on web pages.

A simple perl module for doing robot stuff. For a more elaborate interface, see WWW::Robot. This version uses LWP::Simple to grab pages, and HTML::LinkExtor to extract the links from them. Only href attributes of anchor tags are extracted. Extracted ...

AWRIGLEY/WWW-SimpleRobot-0.07 - 28 Jun 2001 14:50:08 GMT - Search in distribution

WWW::RobotRules::DBIC - Persistent RobotRules which use DBIC.

WWW::RobotRules::DBIC is a subclass of WWW::RobotRules, which use DBIx::Class to store robots.txt info to any RDBMS....

IKEBE/WWW-RobotRules-DBIC-0.01 - 18 Oct 2006 13:58:41 GMT - Search in distribution

WWW::RobotRules::Parser - Just Parse robots.txt

WWW::RobotRules::Parser allows you to simply parse robots.txt files as described in http://www.robotstxt.org/wc/norobots.html. Unlike WWW::RobotRules (which is very cool), this module does not take into consideration your user agent name when parsing...

DMAKI/WWW-RobotRules-Parser-0.04001 - 01 Dec 2007 13:33:54 GMT - Search in distribution

WWW::RobotRules::Extended - database of robots.txt-derived permissions. This is a fork of WWW::RobotRules

This module parses /robots.txt files as specified in "A Standard for Robot Exclusion", at <http://www.robotstxt.org/wc/norobots.html> It also parses rules that contains wildcards '*' and allow directives like Google does. Webmasters can use the /robo...

YSIMONX/WWW-RobotRules-Extended-0.02 - 14 Jan 2012 10:23:47 GMT - Search in distribution

WWW::RobotRules::Memcache - Use memcached in conjunction with WWW::RobotRules

This is a subclass of WWW::RobotRules that uses Cache::Memcache to implement persistent caching of robots.txt and host visit information....

SOCK/WWW-RobotRules-Memcache-0.1 - 08 Sep 2006 02:02:39 GMT - Search in distribution

WWW::Mixi - Perl extension for scraping the MIXI social networking service.

WWW::Mixi uses LWP::RobotUA to scrape mixi.jp. This provide login method, get and put method, and some parsing method for user who create mixi spider. I think using WWW::Mixi is better than using LWP::UserAgent or LWP::Simple for accessing Mixi. WWW:...

TSUKAMOTO/WWW-Mixi-0.50   (1 review) - 01 Aug 2007 06:02:56 GMT - Search in distribution
  • WWW::Mixi - Mixiアクセス用のLWP::UserAgentモジュール

WWW::OhNoRobotCom::Search - search comic transcriptions on http://ohnorobot.com

The module provides interface to perform searches on <http://www.ohnorobot.com> comic transcriptions website....

ZOFFIX/WWW-OhNoRobotCom-Search-0.003 - 18 Dec 2013 22:59:56 GMT - Search in distribution

WWW::Scraper - framework for scraping results from search engines.

NOTE: You can find a full description of the Scraper framework in WWW::Scraper::ScraperPOD.pm. "Scraper" is a framework for issuing queries to a search engine, and scraping the data from the resultant multi-page responses, and the associated detail p...

GLENNWOOD/Scraper-3.05 - 02 Aug 2003 07:47:12 GMT - Search in distribution

WWW::RobotRules::Parser::MultiValue - Parse robots.txt

"WWW::RobotRules::Parser::MultiValue" is a parser for "robots.txt". Parsed rules for the specified user agent is stored as a Hash::MultiValue, where the key is a lower case rule name. "Request-rate" rule is handled specially. It is normalized to "Cra...

TARAO/WWW-RobotRules-Parser-MultiValue-0.02 - 12 Mar 2015 05:46:38 GMT - Search in distribution

POE::Component::WWW::OhNoRobotCom::Search - non-blocking POE based wrapper around WWW::OhNoRobotCom::Search module

The module is a non-blocking wrapper around WWW::OhNoRobotCom::Search which provides interface to <http://www.ohnorobot.com/> search...

ZOFFIX/POE-Component-WWW-OhNoRobotCom-Search-0.002 - 17 Dec 2013 00:50:53 GMT - Search in distribution

LWP - The World-Wide Web library for Perl

The libwww-perl collection is a set of Perl modules which provides a simple and consistent application programming interface (API) to the World-Wide Web. The main focus of the library is to provide classes and functions that allow you to write WWW cl...

ETHER/libwww-perl-6.13   (8 reviews) - 14 Feb 2015 18:45:12 GMT - Search in distribution

POE::Component::IRC::Plugin::WWW::OhNoRobotCom::Search - search http://ohnorobot.com/ website from IRC

This module is a POE::Component::IRC plugin which uses POE::Component::IRC::Plugin for its base. It provides interface to search <http://ohnorobot.com/> website from IRC. It accepts input from public channel events, "/notice" messages as well as "/ms...

ZOFFIX/POE-Component-IRC-Plugin-WWW-OhNoRobotCom-Search-0.002 - 17 Dec 2013 01:01:11 GMT - Search in distribution

w3mir - all purpose HTTP-copying and mirroring tool

You may specify many options and one HTTP-URL on the w3mir command line. A single HTTP URL *must* be specified either on the command line or in a URL directive in a configuration file. If the URL refers to a directory it *must* end with a "/", otherw...

JANL/w3mir-1.0.10 - 04 Feb 2001 21:27:19 GMT - Search in distribution

WWW::Link - maintain information about the state of links

WWW::Link is a perl class which accepts and maintains information about links. For example, this would include urls which are referenced from a WWW page. The link class will be acted on by such programs as link checkers to give it information and by ...

MIKEDLR/WWW-Link-0.036 - 09 Feb 2002 18:10:43 GMT - Search in distribution

CGI::Info - Information about the CGI environment

NHORNE/CGI-Info-0.55 - 11 Jun 2015 21:34:20 GMT - Search in distribution

test-link - test links and update the link database

This program tests links and stores the information about what it found into the Link database. Needs:- * link database * schedule database...

MIKEDLR/Link_Controller-0.037 - 09 Feb 2002 18:12:34 GMT - Search in distribution

WWW::Mailman - Interact with Mailman's web interface from a Perl program

"WWW::Mailman" is a module to control Mailman (as a subscriber, moderator or administrator) without the need of a web browser. The module handles authentication transparently and can take advantage of stored cookies to speed it up. It is meant as a b...

BOOK/WWW-Mailman-1.06 - 06 May 2013 23:17:43 GMT - Search in distribution

WWW::RoboCop - Police your URLs!

BETA BETA BETA! "WWW::RoboCop" is a dead simple, somewhat opinionated robot. Given a starting page, this module will crawl only URLs which have been whitelisted by the "is_url_whitelisted" callback. It then creates a report of all visited pages, keye...

OALDERS/WWW-RoboCop-0.000004 - 16 Apr 2015 22:18:24 GMT - Search in distribution