WWW::RobotRules - database of robots.txt-derived permissions River stage four • 4 direct dependents • 6014 total dependents

This module parses /robots.txt files as specified in "A Standard for Robot Exclusion", at <http://www.robotstxt.org/wc/norobots.html> Webmasters can use the /robots.txt file to forbid conforming robots from accessing parts of their web site. The pars...

GAAS/WWW-RobotRules-6.02 - 18 Feb 2012 13:09:13 UTC - Search in distribution

WWW::RobotRules::DBIC - Persistent RobotRules which use DBIC. River stage zero No dependents

WWW::RobotRules::DBIC is a subclass of WWW::RobotRules, which use DBIx::Class to store robots.txt info to any RDBMS....

IKEBE/WWW-RobotRules-DBIC-0.01 - 18 Oct 2006 13:58:41 UTC - Search in distribution

WWW::RobotRules::Parser - Just Parse robots.txt River stage zero No dependents

WWW::RobotRules::Parser allows you to simply parse robots.txt files as described in http://www.robotstxt.org/wc/norobots.html. Unlike WWW::RobotRules (which is very cool), this module does not take into consideration your user agent name when parsing...

DMAKI/WWW-RobotRules-Parser-0.04001 - 01 Dec 2007 13:33:54 UTC - Search in distribution

WWW::RobotRules::Extended - database of robots.txt-derived permissions. This is a fork of WWW::RobotRules River stage zero No dependents

This module parses /robots.txt files as specified in "A Standard for Robot Exclusion", at <http://www.robotstxt.org/wc/norobots.html> It also parses rules that contains wildcards '*' and allow directives like Google does. Webmasters can use the /robo...

YSIMONX/WWW-RobotRules-Extended-0.02 - 14 Jan 2012 10:23:47 UTC - Search in distribution

WWW::RobotRules::Memcache - Use memcached in conjunction with WWW::RobotRules River stage zero No dependents

This is a subclass of WWW::RobotRules that uses Cache::Memcache to implement persistent caching of robots.txt and host visit information....

SOCK/WWW-RobotRules-Memcache-0.1 - 08 Sep 2006 02:02:39 UTC - Search in distribution

WWW::RobotRules::Parser::MultiValue - Parse robots.txt River stage zero No dependents

"WWW::RobotRules::Parser::MultiValue" is a parser for "robots.txt". Parsed rules for the specified user agent is stored as a Hash::MultiValue, where the key is a lower case rule name. "Request-rate" rule is handled specially. It is normalized to "Cra...

TARAO/WWW-RobotRules-Parser-MultiValue-0.02 - 12 Mar 2015 05:46:38 UTC - Search in distribution

WWW::Mixi - Perl extension for scraping the MIXI social networking service. River stage zero No dependents

WWW::Mixi uses LWP::RobotUA to scrape mixi.jp. This provide login method, get and put method, and some parsing method for user who create mixi spider. I think using WWW::Mixi is better than using LWP::UserAgent or LWP::Simple for accessing Mixi. WWW:...

TSUKAMOTO/WWW-Mixi-0.50 - 01 Aug 2007 06:02:56 UTC - Search in distribution

LWP - The World-Wide Web library for Perl River stage four • 2111 direct dependents • 6012 total dependents

The libwww-perl collection is a set of Perl modules which provides a simple and consistent application programming interface (API) to the World-Wide Web. The main focus of the library is to provide classes and functions that allow you to write WWW cl...

OALDERS/libwww-perl-6.54 - 06 May 2021 17:55:38 UTC - Search in distribution

Saraltest - Perl extension for blah blah blah River stage zero No dependents

Stub documentation for Saraltest, created by h2xs. It looks like the author of the extension was negligent enough to leave the stub unedited....

MARSAB/Saraltest - 26 Aug 2011 10:57:44 UTC - Search in distribution

WWW::Spyder - a simple non-persistent web crawler. River stage zero No dependents

ASHLEY/WWW-Spyder-0.24 - 27 Feb 2008 00:33:57 UTC - Search in distribution

ElephantAgent - the agent that never forgets River stage zero No dependents

This is the robot agent that never forgets. One of the major advantages of the original MOMspider link checker was that it didn't need to keep checking robots.txt files every time it was started. This agent does the same by using a disk cache of host...

MIKEDLR/Link_Controller-0.037 - 09 Feb 2002 18:12:34 UTC - Search in distribution

LWP::Parallel - Extension for LWP to allow parallel HTTP and FTP access River stage one • 2 direct dependents • 3 total dependents

Introduction ParallelUserAgent is an extension to the existing libwww module. It allows you to take a list of URLs (it currently supports HTTP, FTP, and FILE URLs. HTTPS might work, too) and connect to all of them _in parallel_, then wait for the res...

MSOUTH/ParallelUserAgent-2.62 - 29 May 2016 18:55:38 UTC - Search in distribution

Bundle::FinalTest - Perl extension for blah blah blah River stage zero No dependents

Stub documentation for Bundle::FinalTest, created by h2xs. It looks like the author of the extension was negligent enough to leave the stub unedited. Blah blah blah....

MARSAB/Bundle-FinalTest - 02 Aug 2011 05:52:05 UTC - Search in distribution

Bundle::FinalTest2 - Perl extension for blah blah blah River stage zero No dependents

Stub documentation for Bundle::FinalTest2, created by h2xs. It looks like the author of the extension was negligent enough to leave the stub unedited....

MARSAB/Bundle-FinalTest2 - 03 Aug 2011 05:10:02 UTC - Search in distribution

Task::BeLike::PHIPS - My favourite and frequently used modules River stage zero No dependents

PHIPS/Task-BeLike-PHIPS-0.1.2 - 16 Jan 2014 15:42:35 UTC - Search in distribution

Gungho::Component::RobotRules.ja - robots.txtの処理を行う River stage one • 1 direct dependent • 1 total dependent

Gungho::Component::RobotRulesはクローラーを書く以上必ず実装しなければならない robots.txtの処理を行うコンポーネントです。このコンポーネントを使用することに より、全てのリクエストに対しrobots.txtを適切に適応し、許可無くページを クロールすることを避けられます。 Gungho::Component::RobotRulesが組み込まれると、Gunghoに取得要求のあった全ての HTTPリクエストに対し、まずRobotRulesストレージから現在処理中...

DMAKI/Gungho-0.09008 - 28 Jul 2008 10:37:52 UTC - Search in distribution
16 results (0.049 seconds)