LWP::RobotUA - a class for well-behaved Web robots River stage four • 2109 direct dependents • 6006 total dependents

This class implements a user agent that is suitable for robot applications. Robots should be nice to the servers they visit. They should consult the /robots.txt file to ensure that they are welcomed and they should not make requests too frequently. B...

OALDERS/libwww-perl-6.54 - 06 May 2021 17:55:38 UTC - Search in distribution
  • lwptut - An LWP Tutorial
  • LWP - The World-Wide Web library for Perl

WWW::Scraper - framework for scraping results from search engines. River stage one • 1 direct dependent • 1 total dependent

NOTE: You can find a full description of the Scraper framework in WWW::Scraper::ScraperPOD.pm. "Scraper" is a framework for issuing queries to a search engine, and scraping the data from the resultant multi-page responses, and the associated detail p...

GLENNWOOD/Scraper-3.05 - 02 Aug 2003 07:47:12 UTC - Search in distribution

LWP::Parallel::RobotUA - A class for Parallel Web Robots River stage one • 2 direct dependents • 3 total dependents

This class implements a user agent that is suitable for robot applications. Robots should be nice to the servers they visit. They should consult the /robots.txt file to ensure that they are welcomed and they should not make requests too frequently. B...

MSOUTH/ParallelUserAgent-2.62 - 29 May 2016 18:55:38 UTC - Search in distribution

WWW::Mixi - Mixiアクセス用のLWP::UserAgentモジュール River stage zero No dependents

Mixiにアクセスするためのモジュールです。 LWP::RobotUAのサブクラスになっており、LWP::UserAgentおよびLWP::RobotUAと同じように使うことができます。 WWW::MixiにはLWP::UserAgentより便利な点が3つあります。 まず、WWW::Mixiではログイン関連の作業をすべてloginメソッドで済ませることができます。 loginメソッドは、Cookieが無効になっていれば有効にし、オブジェクト生成時に受け取ったメールアドレスとパスワードをlogin...

TSUKAMOTO/WWW-Mixi-0.50 - 01 Aug 2007 06:02:56 UTC - Search in distribution

HTTP::Any - a common interface for HTTP clients (LWP, AnyEvent::HTTP, Curl) River stage zero No dependents

IMPORT I recommend placing using HTTP::Any in a separate module which should be used from any point of your project. Why would not make a simple one-line connection? Because of better flexibility and an option to replace the modules used. For example...

KNI/HTTP-Any-0.13 - 10 Jun 2021 10:22:57 UTC - Search in distribution

Sex - Perl teaches the birds and the bees. River stage zero No dependents

Heterogeneous recombination of Perl packages. Given two (or more, I'm a liberal guy) packages, Sex.pm will recombine their symbols at random recombining them into the new module thus providing a cross-section of its functions and global variables. It...

MSCHWERN/Sex-0.69 - 01 Apr 2000 21:32:53 UTC - Search in distribution

requester - Request additional information from sites with bad reporting practices River stage zero No dependents

During your daily operation of the "Mail::Abuse" system, you will find sites that have very poor reporting practices and tend to not send relevant information along with their complaints. One notable example is Hotmail, which sends a generic note tha...

LUISMUNOZ/Mail-Abuse-1.026 - 22 Jun 2007 20:49:09 UTC - Search in distribution
  • maps-scan - Checks listing status on mail-abuse.org's RBLs
  • maps-gather - Gather evidence associated to a MAPS complaint

WWW::Robot - configurable web traversal engine (for web robots & agents) River stage one • 2 direct dependents • 2 total dependents

This module implements a configurable web traversal engine, for a *robot* or other web agent. Given an initial web page (*URL*), the Robot will get the contents of that page, and extract all links on the page, adding them to a list of URLs to visit. ...

KVENTIN/WWW-Robot-0.026 - 07 Aug 2009 13:21:26 UTC - Search in distribution

WWW::Spyder - a simple non-persistent web crawler. River stage zero No dependents

ASHLEY/WWW-Spyder-0.24 - 27 Feb 2008 00:33:57 UTC - Search in distribution

WWW::Search - Virtual base class for WWW searches River stage two • 31 direct dependents • 33 total dependents

This class is the parent for all access methods supported by the "WWW::Search" library. This library implements a Perl API to web-based search engines. See README for a list of search engines currently supported, and for a lot of interesting high-lev...

MTHURN/WWW-Search-2.519 - 03 Apr 2020 15:23:14 UTC - Search in distribution

WWW::GoKGS - KGS Go Server (http://www.gokgs.com/) Scraper River stage zero No dependents

This module is a KGS Go Server ("http://www.gokgs.com/") scraper. KGS allows the users to play a board game called go a.k.a. baduk (Korean) or weiqi (Chinese). Although the web server provides resources generated dynamically, such as Game Archives, t...

ANAZAWA/WWW-GoKGS-0.21 - 21 Aug 2014 02:27:48 UTC - Search in distribution

WWW::RobotRules - database of robots.txt-derived permissions River stage four • 4 direct dependents • 6008 total dependents

This module parses /robots.txt files as specified in "A Standard for Robot Exclusion", at <http://www.robotstxt.org/wc/norobots.html> Webmasters can use the /robots.txt file to forbid conforming robots from accessing parts of their web site. The pars...

GAAS/WWW-RobotRules-6.02 - 18 Feb 2012 13:09:13 UTC - Search in distribution

RDF::Scutter - Perl extension for harvesting distributed RDF resources River stage zero No dependents

As the name implies, this is an RDF Scutter. A scutter is a web robot that follows "seeAlso"-links, retrieves the content it finds at those URLs, and adds the RDF statements it finds there to its own store of RDF statements. This module is an alpha r...

KJETILK/RDF-Scutter-0.1 - 01 Nov 2005 01:00:47 UTC - Search in distribution

WWW::TinySong - Get free music links from tinysong.com River stage zero No dependents

tinysong.com is a web app that can be queried for a song and returns a tiny URL, allowing you to listen to the song for free online and share it with friends. WWW::TinySong is a Perl interface to this service, allowing you to programmatically search ...

MIOREL/WWW-TinySong-1.01 - 26 Jun 2009 15:52:50 UTC - Search in distribution

ElephantAgent - the agent that never forgets River stage zero No dependents

This is the robot agent that never forgets. One of the major advantages of the original MOMspider link checker was that it didn't need to keep checking robots.txt files every time it was started. This agent does the same by using a disk cache of host...

MIKEDLR/Link_Controller-0.037 - 09 Feb 2002 18:12:34 UTC - Search in distribution

Dezi::Aggregator::Spider - web aggregator River stage one • 2 direct dependents • 8 total dependents

Dezi::Aggregator::Spider is a web crawler similar to the spider.pl script in the Swish-e 2.4 distribution. Internally, Dezi::Aggregator::Spider uses LWP::RobotUA to do the hard work. See Dezi::Aggregator::Spider::UA....

KARMAN/Dezi-App-0.016 - 27 Apr 2018 14:12:33 UTC - Search in distribution

Search::Circa::Indexer - provide functions to administrate Circa, a www search engine running with Mysql River stage zero No dependents

This is Circa::Indexer, a module who provide functions to administrate Circa, a www search engine running with Mysql. Circa is for your Web site, or for a list of sites. It indexes like Altavista does. It can read, add and parse all url's found in a ...

ALIAN/Search-Circa-1.18 - 02 Jan 2003 12:35:27 UTC - Search in distribution

Bundle::Ensembl - Bundle for installing Ensembl Perl Modules (Built from dependencies of ENSEMBL_45 VERSION) River stage zero No dependents

A Bundle of Modules related to Ensembl Genome Browser Installation (Ensembl V45). If there are any modules that needs to be installed please email me at gene@gpse.org...

ASHGENE/Bundle-Ensembl-0.03 - 03 Jul 2007 11:54:04 UTC - Search in distribution

Bundle::Urchin - Urchin RSS Aggregator Perl Dependencies River stage zero No dependents

These are Perl dependencies for the Urchin RSS aggregator software. <http://urchin.sourceforge.net/> After installing you may get a report that there were some problems installing certain modules. Before reporting them make sure that they haven't ins...

ESUMMERS/Bundle-Urchin-0.1 - 14 Oct 2004 21:08:01 UTC - Search in distribution

WWW::RobotRules::Extended - database of robots.txt-derived permissions. This is a fork of WWW::RobotRules River stage zero No dependents

This module parses /robots.txt files as specified in "A Standard for Robot Exclusion", at <http://www.robotstxt.org/wc/norobots.html> It also parses rules that contains wildcards '*' and allow directives like Google does. Webmasters can use the /robo...

YSIMONX/WWW-RobotRules-Extended-0.02 - 14 Jan 2012 10:23:47 UTC - Search in distribution
24 results (0.112 seconds)