NAME

W3C::LogValidator - The W3C Log Validator - Quality-focused Web Server log processing engine

Checks quality/validity of most popular content on a Web server

DESCRIPTION

W3C::LogValidator is the main module for the W3C Log Validator, a combination of Web Server log analysis and statistics tool and Web Content quality checker.

The W3C::LogValidator can batch-process a number of documents through a number of quality focus checks, such as HTML or CSS validation, or checking for broken links. It can take a number of different inputs, ranging from a simple list of URIs to log files from various Web servers. And since it orders the result depending on the number of times a document appears in the file or logs, it is, in practice, a useful way to spot the most popular documents that need work.

the perl script logprocess.pl, bundled in the W3C::LogValidator distribution, is a simple way to use the features of W3C::LogValidator. Developers can also use W3C::LogValidator can be used as a perl module to build applications.

The homepage for the Log Validator is at: http://www.w3.org/QA/Tools/LogValidator/

SYNOPSIS

The simple way to use is to edit the sample configuration file (samples/logprocess.conf) and to run the bundled logprocess.pl script with this configuration file, a la:

    logprocess.pl -f /path/to/logprocess.conf

The basic task of the W3C::LogValidator module is to parse a configuration file and process relevant logs, passed through a configuration file argument:

    use W3C::LogValidator;
    my $logprocessor = W3C::LogValidator->new("sample.conf");
    $logprocessor->process;

Alternatively, it will use default a default config and try to process Web server logs in "well known locations":

    my $logprocessor = W3C::LogValidator->new;
    $logprocessor->process;

API

Constructor

$processor = W3C::LogValidator->new

Constructs a new W3C::LogValidator processor. You might pass a configuration file name, as well as a hash of attribute-value pairs as parameters to the constructor.

e.g. for mail output:

  %conf = (
    "UseOutputModule" => "W3C::LogValidator::Output::Mail",
    "ServerAdmin" => 'webmaster@example.com',
    "verbose" => "3"
    );
  $processor = W3C::LogValidator->new("path/to/config.conf", \%conf);

Or e.g. for HTML output:

  %conf = (
    "UseOutputModule" => "W3C::LogValidator::Output::HTML",
    "OutputTo" => 'path/to/file.html',
    "verbose" => "0"
    );
  $processor = W3C::LogValidator->new("path/to/config.conf", \%conf);

If given the path to a configuration file, new() will call the W3C::LogValidator::Config module to get its configuration variables. Otherwise, a default set of values is used.

Main processing method

$processor->process =item $processor->find_remote_addr

Given a log record and the type of the log (common log format, flat list of URIs, etc), extracts the remote host or ip

Do-it-all method: Read configuration file (if any), parse log files, run them through processing modules, send result to output module.

Modules methods

$processor->config_module

Creates a configuration hash for a specific module, adding module-specific configuration variables, overriding if necessary

$processor->use_modules

Run the data parsed off the log files through the various processing (validation) modules specified by UseValidationModule in the configuration.

Log parsing and URI methods

$processor->read_logfiles

Loops through and parses all log files specified in the configuration

$processor->read_logfile('path/to.file')

Extracts URIs and number of hits from a given log file, and feeds it to the processor's URI/Hits table

$processor->find_uri

Given a log record and the type of the log (common log format, flat list of URIs, etc), extracts the URI

$processor->remove_duplicates

Given a URI, removes "directory index" suffixes such as index.html, etc so that http://foobar/ and http://foobar/index.html be counted as one resource

$processor->add_uri

Add a URI to the processor's URI/Hits table

$processor->sorted_uris

Returns the list of URIs in the processor's table, sorted by popularity (hits)

$processor->no_cgi

Tests whether a given URI contains a CGI query string

$processor->hit

Returns the number of hits for a given URI. Basically a "public" method accessing $hits{$uri};

BUGS

Public bug-tracking interface at http://www.w3.org/Bugs/Public/

AUTHOR

Olivier Thereaux <ot@w3.org> for The World Wide Web Consortium

SEE ALSO

Up-to-date information on the Log Validator at:

 http://www.w3.org/QA/Tools/LogValidator/

Articles and Tutorials

Several articles have been written within the W3C Quality Assurance Interest Group on the topic of improving the quality of Web sites, notably by using a step-by-step approach and relying upon the Log Validator to help find the areas to fix in priority.

My Web site is standard! And yours?

Available at http://www.w3.org/QA/2002/04/Web-Quality

Web Standards Switch

or how to improve your Web site easily.

Available in several languages at: http://www.w3.org/QA/2003/03/web-kit

Making your website valid: a step by step guide.

Available at http://www.w3.org/QA/2002/09/Step-by-step