NAME

WWW::CheckSite - OO interface to an iterator that checks a website

SYNOPSIS

    use WWW::CheckSite;

    my $wcs = WWW::CheckSite->new(
        uri    => 'http://www.test-smoke.org/',
        prefix => 'tsorg',
        save   => 1,
    );

    $wcs->validate;

    $wcs->write_report;

Or using saved data (skip the real validation):

    my $wcs = WWW::CheckSite->load(
        uri    => 'http://www.test-smoke.org/',
        prefix => 'tsorg',
    );

    $wcs->write_report;

DESCRIPTION

This module implents a spider, that checks the pages on a website. For each page the links and images on that page are checked for availability. After that, the page is validated by W3.ORG.

When the spider is done, one can have a report in HTML written.

WARNING: Although the spider respects /robots.txt on the target site, the validator does not! Use this tool only on your own sites.

METHODS

WWW::CheckSite->new( %args )

Initialize a new instance. Options supported:

  • uri => the base uri to check [mandatory]

  • prefix => the name of the project [mandatory]

  • dir => target directory (curdir())

  • save => true/false (false)

  • strictrules => true/false (false)

  • validate => by_none/by_uri/by_upload (by_none)

  • ua_class => override the user agent class

  • ua_args => hashref with extra options passed to the user agent class

  • v => $verbosity, where $verbosity may be

  • tt => boolean to allow the use of Template Toolkit

    0

    Be quiet (default).

    1

    Report basic information for every visited page (e.g. number of links and images) and total time for checking the site.

    2

    Additional reporting of page validation details.

WWW::CheckSite->load( %args )

Initialize the object from datafile. Supported options:

  • dir => target/source directory

  • prefix => the prefix used for this dataset [mandatory]

  • tt => boolean to allow the use of Template Toolkit

$wcs->validate

The validate() method collects all the data.

$wcs->dump_links( $noskipped )

Return a list with all URLs encountered during site-traversal.

$wcs->write_report

Generate the reports.

$wcs->write_ht_report()

Load, fill the HTML::Template template and write the reports.

$wcs->write_tt_report()

Load, fill the Template Toolkit template and write the reports.

$wcs->_die;

Do a Carp::croak().

NO METHODS

create_report()

Load and fill the HTML::Template.

create_report_data()

Return a hash with all the data needed to fill both the HTML::Template and the Template Toolkit templates.

AUTHOR

Abe Timmerman, <abeltje@cpan.org>

BUGS

Please report any bugs or feature requests to bug-www-checksite@rt.cpan.org, or through the web interface at http://rt.cpan.org. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

COPYRIGHT & LICENSE

Copyright MMV Abe Timmerman, All Rights Reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

2 POD Errors

The following errors were encountered while parsing the POD:

Around line 92:

Expected text after =item, not a number

Around line 97:

Expected text after =item, not a number