NAME

checksite - Check the contents of a website

SYNOPSIS

    $ checksite [options] -p <name> uri

OPTIONS

Results
  --prefix|-p <name>        The prefix (dir) of this check [mandatory]
  --dir|-d <dir>            The target directory
Persistence
  --[no]save                Save validation results
  --load                    Load the validation results
(X)HTML validation
  --nohtml                  Skip (X)HTML validation
  --html_validator <uri>    Base uri for the W3C (X)HTML validator
  --html_upload             Validate (X)HTML by uploading
  --html_uri                Validate (X)HTML by sending the uri
  --xmllint                 Validate by using the xmllint program
CSS validation
  --nocss                   Skip CSS validation
  --css_validator <uri>     Base uri for the W3C CSS validator
  --css_upload              Validate CSS by uploading
  --css_uri                 Validate CSS by sending the uri
Exclusion
  --disallow <path>         Add Disallow: rules to robots.txt (multiple)

  --nostrictrules           Do not impose /robots.txt on the validator
                            for "local" url's
General
  --lang|-l <lang>          Set language(s) for Accept-Language: header

  --ua_class <Module>       Set a new UserAgent class
                            (child of WWW::Mechanize)

  -v                        Increase verbosity (multiple)
  --help|-h                 This message

See WWW::CheckSite::Manual for more information.

DESCRIPTION

This program will spider the specified url and check the availability of the links, images and stylesheets on each page.

INCOMPATIBLE CHANGE AS OF 0.020: Pages and stylesheets are NO LONGER validated with the validators available at http://validator.w3.org and http://jigsaw.w3.org. These validators do not allow robots! The W3C-HTML validator is now widly available and very installable, so I advise you to run your own. The W3C-CSS validator is more work, but I have managed to get that to work as well with Jigsaw.

When all pages are checked two reports in HTML-format are generated. The full.html report contains all the information for all pages and the summ.html report contains only the pages with errors and their errors.

Metrics for a spidered page

Each page fetched by the spider will have these metrics:

  • status, status_tx

    The HTTP-returncode and a verbal explanation of that code

  • title

    The contents of the <title></title> tag.

  • ct

    The MIME type returned by the HTTP-server for the document.

  • links

    A list of <a href=>, <area href=> and <frame src=> uri's found on the page with the HTTP-returncode. Each HTML-code is also checked for the text or ALT/TITLE attribute.

  • link_cnt, links_ok

    The number of links found and the number of links that are ok.

  • images

    A list of <img src=> and <input type=image> uri's found on the page with the HTTP-returncode and MIME type. Each HTML tag is also checked for the existance of the ALT attribute.

  • image_cnt, images_ok

    The number of images found and the number of images that are ok.

  • styles

    A list of <link rel=stylesheet type=text/css> uri's found on the page with the HTTP-returncode, MIME type and CSS-validation result.

  • style_cnt, styles_ok

    The number of stylesheets found and the number of stylesheets that are ok.

  • valid

    The HTML-validation result.

FILES

checksite supports Config::Auto. This means that any of following directories is searched for checksiteconfig, checksite.config, checksiterc and .checksiterc:

current directory
bin directory (where the script is installed)
$HOME
/etc/
/usr/local/etc/

SEE ALSO

AUTHOR

Abe Timmerman, <abeltje@cpan.org>

BUGS

Please report any bugs or feature requests to bug-WWW-CheckSite@rt.cpan.org, or through the web interface at http://rt.cpan.org. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

COPYRIGHT & LICENSE

Copyright MMV-MMVII Abe Timmerman, All Rights Reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.