The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

checksite - Check the contents of a website

SYNOPSIS

    $ checksite [options] -p <name> uri

OPTIONS

    --prefix|-p <name>   The prefix (dir) of this check [mandatory]
    --dir|-d <dir>       The target directory

    --[no]save           Save validation results
    --load               Load the validation results

    --novalidate         Skip the W3 validation
    --by_xmllint         Validate by using the xmllint program
    --by_uri             Validate by sending the uri to W3
    --by_upload          Validate by uploading the contents to W3

    --disallow <path>    Add Disallow: rules to robots.txt (multiple)

    --nostrictrules      Do not impose /robots.txt on the validator

    --lang|-l <lang>     Set language(s) for Accept-Language: header

    --ua_class <Module>  Set a new UserAgent class (child of WWW::Mechanize)

    -v                   increase verbosity (multiple)
    --help|-h            This message

See WWW::CheckSite::Manual for more information.

DESCRIPTION

This program will spider the specified url and check the availability of the links, images and stylesheets on each page. Pages and stylesheets are also validated with the validators available at http://validator.w3.org and http://jigsaw.w3.org.

When all pages are checked two reports in HTML-format are generated. The full.html report contains all the information for all pages and the summ.html report contains only the pages with errors and their errors.

Metrics for a spidered page

Each page fetched by the spider will have these metrics:

  • status, status_tx

    The HTTP-returncode and a verbal explanation of that code

  • title

    The contents of the <title></title> tag.

  • ct

    The MIME type returned by the HTTP-server for the document.

  • valid

    The HTML-validation result.

  • links

    A list of <a href=>, <area href=> and <frame src=> uri's found on the page with the HTTP-returncode. Each HTML-code is also checked for the text or ALT/TITLE attribute.

  • link_cnt, links_ok

    The number of links found and the number of links that are ok.

  • images

    A list of <img src=> and <input type=image> uri's found on the page with the HTTP-returncode and MIME type. Each HTML tag is also checked for the existance of the ALT attribute.

  • image_cnt, images_ok

    The number of images found and the number of images that are ok.

  • styles

    A list of <link rel=stylesheet type=text/css> uri's found on the page with the HTTP-returncode, MIME type and CSS-validation result.

  • style_cnt, styles_ok

    The number of stylesheets found and the number of stylesheets that are ok.

SEE ALSO

AUTHOR

Abe Timmerman, <abeltje@cpan.org>

BUGS

Please report any bugs or feature requests to bug-WWW-CheckSite@rt.cpan.org, or through the web interface at http://rt.cpan.org. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

COPYRIGHT & LICENSE

Copyright MMV Abe Timmerman, All Rights Reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.