46 results (1.022 seconds)
WWW::Robot - configurable web traversal engine (for web robots & agents) ++

This module implements a configurable web traversal engine, for a *robot* or other web agent. Given an initial web page (*URL*), the Robot will get the contents of that page, and extract all links on the page, adding them to a list of URLs to visit. ...

KVENTIN/WWW-Robot-0.026 - 07 Aug 2009 13:21:26 GMT - Search in distribution

WWW::RobotRules - database of robots.txt-derived permissions 4 ++

This module parses /robots.txt files as specified in "A Standard for Robot Exclusion", at <http://www.robotstxt.org/wc/norobots.html> Webmasters can use the /robots.txt file to forbid conforming robots from accessing parts of their web site. The pars...

GAAS/WWW-RobotRules-6.02   (1 review) - 18 Feb 2012 13:09:13 GMT - Search in distribution

WWW::SimpleRobot - a simple web robot for recursively following links on web pages. ++

A simple perl module for doing robot stuff. For a more elaborate interface, see WWW::Robot. This version uses LWP::Simple to grab pages, and HTML::LinkExtor to extract the links from them. Only href attributes of anchor tags are extracted. Extracted ...

AWRIGLEY/WWW-SimpleRobot-0.07 - 28 Jun 2001 14:50:08 GMT - Search in distribution

WWW::RobotRules::DBIC - Persistent RobotRules which use DBIC. ++

WWW::RobotRules::DBIC is a subclass of WWW::RobotRules, which use DBIx::Class to store robots.txt info to any RDBMS. SYNOPSIS use WWW::RobotRules::DBIC; use LWP::RobotUA; my $rules = WWW::RobotRules::DBIC->new('dbi:mysql:robot_rules', 'root', '', \%o...

IKEBE/WWW-RobotRules-DBIC-0.01 - 18 Oct 2006 13:58:41 GMT - Search in distribution

WWW::RobotRules::Parser - Just Parse robots.txt ++

WWW::RobotRules::Parser allows you to simply parse robots.txt files as described in http://www.robotstxt.org/wc/norobots.html. Unlike WWW::RobotRules (which is very cool), this module does not take into consideration your user agent name when parsing...

DMAKI/WWW-RobotRules-Parser-0.04001 - 01 Dec 2007 13:33:54 GMT - Search in distribution

WWW::RobotRules::Extended - database of robots.txt-derived permissions. This is a fork of WWW::RobotRules ++

This module parses /robots.txt files as specified in "A Standard for Robot Exclusion", at <http://www.robotstxt.org/wc/norobots.html> It also parses rules that contains wildcards '*' and allow directives like Google does. Webmasters can use the /robo...

YSIMONX/WWW-RobotRules-Extended-0.02 - 14 Jan 2012 10:23:47 GMT - Search in distribution

WWW::RobotRules::Memcache - Use memcached in conjunction with WWW::RobotRules ++

This is a subclass of WWW::RobotRules that uses Cache::Memcache to implement persistent caching of robots.txt and host visit information. FUNCTIONS new(server [, server ..]) When creating this object you must pass at least one memcache server. AUTHOR...

SOCK/WWW-RobotRules-Memcache-0.1 - 08 Sep 2006 02:02:39 GMT - Search in distribution

WWW::Mixi - Perl extension for scraping the MIXI social networking service. ++

WWW::Mixi uses LWP::RobotUA to scrape mixi.jp. This provide login method, get and put method, and some parsing method for user who create mixi spider. I think using WWW::Mixi is better than using LWP::UserAgent or LWP::Simple for accessing Mixi. WWW:...

TSUKAMOTO/WWW-Mixi-0.50   (1 review) - 01 Aug 2007 06:02:56 GMT - Search in distribution
  • WWW::Mixi - Mixiアクセス用のLWP::UserAgentモジュール

WWW::OhNoRobotCom::Search - search comic transcriptions on http://ohnorobot.com ++

The module provides interface to perform searches on <http://www.ohnorobot.com> comic transcriptions website. CONSTRUCTOR new my $site = WWW::OhNoRobotCom::Search->new; my $site = WWW::OhNoRobotCom::Search->new( timeout => 10, ); my $site = WWW::OhNo...

ZOFFIX/WWW-OhNoRobotCom-Search-0.003 - 18 Dec 2013 22:59:56 GMT - Search in distribution

WWW::Scraper - framework for scraping results from search engines. ++

NOTE: You can find a full description of the Scraper framework in WWW::Scraper::ScraperPOD.pm. "Scraper" is a framework for issuing queries to a search engine, and scraping the data from the resultant multi-page responses, and the associated detail p...

GLENNWOOD/Scraper-3.05 - 02 Aug 2003 07:47:12 GMT - Search in distribution

POE::Component::WWW::OhNoRobotCom::Search - non-blocking POE based wrapper around WWW::OhNoRobotCom::Search module ++

The module is a non-blocking wrapper around WWW::OhNoRobotCom::Search which provides interface to <http://www.ohnorobot.com/> search CONSTRUCTOR spawn my $poco = POE::Component::WWW::OhNoRobotCom::Search->spawn; POE::Component::WWW::OhNoRobotCom::Sea...

ZOFFIX/POE-Component-WWW-OhNoRobotCom-Search-0.002 - 17 Dec 2013 00:50:53 GMT - Search in distribution

LWP - The World-Wide Web library for Perl 86 ++

The libwww-perl collection is a set of Perl modules which provides a simple and consistent application programming interface (API) to the World-Wide Web. The main focus of the library is to provide classes and functions that allow you to write WWW cl...

MSCHILLI/libwww-perl-6.07   (8 reviews) - 02 Jul 2014 05:10:47 GMT - Search in distribution

w3mir - all purpose HTTP-copying and mirroring tool ++

You may specify many options and one HTTP-URL on the w3mir command line. A single HTTP URL *must* be specified either on the command line or in a URL directive in a configuration file. If the URL refers to a directory it *must* end with a "/", otherw...

JANL/w3mir-1.0.10 - 04 Feb 2001 21:27:19 GMT - Search in distribution

POE::Component::IRC::Plugin::WWW::OhNoRobotCom::Search - search http://ohnorobot.com/ website from IRC ++

This module is a POE::Component::IRC plugin which uses POE::Component::IRC::Plugin for its base. It provides interface to search <http://ohnorobot.com/> website from IRC. It accepts input from public channel events, "/notice" messages as well as "/ms...

ZOFFIX/POE-Component-IRC-Plugin-WWW-OhNoRobotCom-Search-0.002 - 17 Dec 2013 01:01:11 GMT - Search in distribution

WWW::Link - maintain information about the state of links ++

WWW::Link is a perl class which accepts and maintains information about links. For example, this would include urls which are referenced from a WWW page. The link class will be acted on by such programs as link checkers to give it information and by ...

MIKEDLR/WWW-Link-0.036 - 09 Feb 2002 18:10:43 GMT - Search in distribution

CGI::Info - Information about the CGI environment ++
NHORNE/CGI-Info-0.46 - 11 Nov 2013 14:21:35 GMT - Search in distribution

test-link - test links and update the link database ++

This program tests links and stores the information about what it found into the Link database. Needs:- * link database * schedule database CONFIGURATION Configuration is done using the WWW::Link_Controller::ReadConf (3) module. You may want to expli...

MIKEDLR/Link_Controller-0.037 - 09 Feb 2002 18:12:34 GMT - Search in distribution

WWW::Mailman - Interact with Mailman's web interface from a Perl program 1 ++

"WWW::Mailman" is a module to control Mailman (as a subscriber, moderator or administrator) without the need of a web browser. The module handles authentication transparently and can take advantage of stored cookies to speed it up. It is meant as a b...

BOOK/WWW-Mailman-1.06 - 06 May 2013 23:17:43 GMT - Search in distribution

WWW::Sitemap - functions for generating a site map for a given site URL. ++

The "WWW::Sitemap" module creates a sitemap for a site, by traversing the site using the WWW::Robot module. The sitemap object has methods to access a list of all the urls in the site, and a list of all the links for each of these urls. It is also po...

AWRIGLEY/sitemapper-1.019 - 09 Jun 2000 15:23:35 GMT - Search in distribution

WWW::Spyder - a simple non-persistent web crawler. ++
ASHLEY/WWW-Spyder-0.24 - 27 Feb 2008 00:33:57 GMT - Search in distribution

Hosting generously
sponsored by Bytemark