checksite - Check the contents of a website


    $ checksite [options] -p <name> uri


  --prefix|-p <name>        The prefix (dir) of this check [mandatory]
  --dir|-d <dir>            The target directory
  --[no]save                Save validation results
  --load                    Load the validation results
(X)HTML validation
  --nohtml                  Skip (X)HTML validation
  --html_validator <uri>    Base uri for the W3C (X)HTML validator
  --html_upload             Validate (X)HTML by uploading
  --html_uri                Validate (X)HTML by sending the uri
  --xmllint                 Validate by using the xmllint program
CSS validation
  --nocss                   Skip CSS validation
  --css_validator <uri>     Base uri for the W3C CSS validator
  --css_upload              Validate CSS by uploading
  --css_uri                 Validate CSS by sending the uri
  --disallow <path>         Add Disallow: rules to robots.txt (multiple)

  --nostrictrules           Do not impose /robots.txt on the validator
                            for "local" url's
  --lang|-l <lang>          Set language(s) for Accept-Language: header

  --ua_class <Module>       Set a new UserAgent class
                            (child of WWW::Mechanize)

  -v                        Increase verbosity (multiple)
  --help|-h                 This message

See WWW::CheckSite::Manual for more information.


This program will spider the specified url and check the availability of the links, images and stylesheets on each page.

INCOMPATIBLE CHANGE AS OF 0.020: Pages and stylesheets are NO LONGER validated with the validators available at and These validators do not allow robots! The W3C-HTML validator is now widly available and very installable, so I advise you to run your own. The W3C-CSS validator is more work, but I have managed to get that to work as well with Jigsaw.

When all pages are checked two reports in HTML-format are generated. The full.html report contains all the information for all pages and the summ.html report contains only the pages with errors and their errors.

Metrics for a spidered page

Each page fetched by the spider will have these metrics:

  • status, status_tx

    The HTTP-returncode and a verbal explanation of that code

  • title

    The contents of the <title></title> tag.

  • ct

    The MIME type returned by the HTTP-server for the document.

  • links

    A list of <a href=>, <area href=> and <frame src=> uri's found on the page with the HTTP-returncode. Each HTML-code is also checked for the text or ALT/TITLE attribute.

  • link_cnt, links_ok

    The number of links found and the number of links that are ok.

  • images

    A list of <img src=> and <input type=image> uri's found on the page with the HTTP-returncode and MIME type. Each HTML tag is also checked for the existance of the ALT attribute.

  • image_cnt, images_ok

    The number of images found and the number of images that are ok.

  • styles

    A list of <link rel=stylesheet type=text/css> uri's found on the page with the HTTP-returncode, MIME type and CSS-validation result.

  • style_cnt, styles_ok

    The number of stylesheets found and the number of stylesheets that are ok.

  • valid

    The HTML-validation result.


checksite supports Config::Auto. This means that any of following directories is searched for checksiteconfig, checksite.config, checksiterc and .checksiterc:

current directory
bin directory (where the script is installed)



Abe Timmerman, <>


Please report any bugs or feature requests to, or through the web interface at I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.


Copyright MMV-MMVII Abe Timmerman, All Rights Reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.