NAME

WWW::SimpleRobot - a simple web robot for recursively following links on web pages.

SYNOPSIS


            
              
              use WWW::SimpleRobot;
my $robot = WWW::SimpleRobot->new(
    URLS            => [ 'http://www.perl.org/' ],
    FOLLOW_REGEX    => "^http://www.perl.org/",
    DEPTH           => 1,
    TRAVERSAL       => 'depth',
    VISIT_CALLBACK  => 
        sub { 
            my ( $url, $depth, $html, $links ) = @_;
            print STDERR "Visiting $url\n"; 
            print STDERR "Depth = $depth\n"; 
            print STDERR "HTML = $html\n"; 
            print STDERR "Links = @$links\n"; 
        }
    ,
    BROKEN_LINK_CALLBACK  => 
        sub { 
            my ( $url, $linked_from, $depth ) = @_;
            print STDERR "$url looks like a broken link on $linked_from\n"; 
            print STDERR "Depth = $depth\n"; 
        }
);
$robot->traverse;
my @urls = @{$robot->urls};
my @pages = @{$robot->pages};
for my $page ( @pages )
{
    my $url = $page->{url};
    my $depth = $page->{depth};
    my $modification_time = $page->{modification_time};
}

DESCRIPTION


            
              
              A simple perl module for doing robot stuff. For a more elaborate interface,
see WWW::Robot. This version uses LWP::Simple to grab pages, and
HTML::LinkExtor to extract the links from them. Only href attributes of
anchor tags are extracted. Extracted links are checked against the
FOLLOW_REGEX regex to see if they should be followed. A HEAD request is
made to these links, to check that they are 'text/html' type pages.

BUGS


            
              
              This robot doesn't respect the Robot Exclusion Protocol
(http://info.webcrawler.com/mak/projects/robots/norobots.html) (naughty
robot!), and doesn't do any exception handling if it can't get pages - it
just ignores them and goes on to the next page!

AUTHOR

Ave Wrigley <Ave.Wrigley@itn.co.uk>

COPYRIGHT

To install WWW::SimpleRobot, copy and paste the appropriate command in to your terminal.

cpanm

cpanm WWW::SimpleRobot

CPAN shell

perl -MCPAN -e shell
install WWW::SimpleRobot

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)