Olivier Thereaux


W3C::LogValidator - The W3C Log Validator - Quality-focused Web Server log processing engine

Checks quality/validity of most popular content on a Web server


W3C::LogValidator is the main module for the W3C Log Validator, a combination of Web Server log analysis and statistics tool and Web Content quality checker.

The W3C::LogValidator can batch-process a number of documents through a number of quality focus checks, such as HTML or CSS validation, or checking for broken links. It can take a number of different inputs, ranging from a simple list of URIs to log files from various Web servers. And since it orders the result depending on the number of times a document appears in the file or logs, it is, in practice, a useful way to spot the most popular documents that need work.

the perl script logprocess.pl, bundled in the W3C::LogValidator distribution, is a simple way to use the features of W3C::LogValidator. Developers can also use W3C::LogValidator can be used as a perl module to build applications.

The homepage for the Log Validator is at: http://www.w3.org/QA/Tools/LogValidator/


The simple way to use is to edit the sample configuration file (samples/logprocess.conf) and to run the bundled logprocess.pl script with this configuration file, a la:

    logprocess.pl -f /path/to/logprocess.conf

The basic task of the W3C::LogValidator module is to parse a configuration file and process relevant logs, passed through a configuration file argument:

    use W3C::LogValidator;
    my $logprocessor = W3C::LogValidator->new("sample.conf");

Alternatively, it will use default a default config and try to process Web server logs in "well known locations":

    my $logprocessor = W3C::LogValidator->new;



$processor = W3C::LogValidator->new

Constructs a new W3C::LogValidator processor. You might pass a configuration file name, as well as a hash of attribute-value pairs as parameters to the constructor.

e.g. for mail output:

  %conf = (
    "UseOutputModule" => "W3C::LogValidator::Output::Mail",
    "ServerAdmin" => 'webmaster@example.com',
    "verbose" => "3"
  $processor = W3C::LogValidator->new("path/to/config.conf", \%conf);

Or e.g. for HTML output:

  %conf = (
    "UseOutputModule" => "W3C::LogValidator::Output::HTML",
    "OutputTo" => 'path/to/file.html',
    "verbose" => "0"
  $processor = W3C::LogValidator->new("path/to/config.conf", \%conf);

If given the path to a configuration file, new() will call the W3C::LogValidator::Config module to get its configuration variables. Otherwise, a default set of values is used.

Main processing method

$processor->process =item $processor->find_remote_addr

Given a log record and the type of the log (common log format, flat list of URIs, etc), extracts the remote host or ip

Do-it-all method: Read configuration file (if any), parse log files, run them through processing modules, send result to output module.

Modules methods


Creates a configuration hash for a specific module, adding module-specific configuration variables, overriding if necessary


Run the data parsed off the log files through the various processing (validation) modules specified by UseValidationModule in the configuration.

Log parsing and URI methods


Loops through and parses all log files specified in the configuration


Extracts URIs and number of hits from a given log file, and feeds it to the processor's URI/Hits table


Given a log record and the type of the log (common log format, flat list of URIs, etc), extracts the URI


Given a URI, removes "directory index" suffixes such as index.html, etc so that http://foobar/ and http://foobar/index.html be counted as one resource


Add a URI to the processor's URI/Hits table


Returns the list of URIs in the processor's table, sorted by popularity (hits)


Tests whether a given URI contains a CGI query string


Returns the number of hits for a given URI. Basically a "public" method accessing $hits{$uri};


Public bug-tracking interface at http://www.w3.org/Bugs/Public/


Olivier Thereaux <ot@w3.org> for The World Wide Web Consortium


Up-to-date information on the Log Validator at:


Articles and Tutorials

Several articles have been written within the W3C Quality Assurance Interest Group on the topic of improving the quality of Web sites, notably by using a step-by-step approach and relying upon the Log Validator to help find the areas to fix in priority.

My Web site is standard! And yours?

Available at http://www.w3.org/QA/2002/04/Web-Quality

Web Standards Switch

or how to improve your Web site easily.

Available in several languages at: http://www.w3.org/QA/2003/03/web-kit

Making your website valid: a step by step guide.

Available at http://www.w3.org/QA/2002/09/Step-by-step