=head1 NAME

Logfile - Perl extension for generating reports from logfiles


  use Logfile::Cern;

  $l = new Logfile::Cern  File  => 'cache.log.gz', 
                          Group => [Domain,File,Hour];
  $l->report(Group => File,   Sort => Records);
  $l->report(Group => Domain, Sort => Bytes);
  $l->report(Group => Hour, List => [Bytes, Records]);

  use Logfile::Wftp;



The Logfile extension will help you to generate various reports from
different server logfiles. In general there is no restriction as to what
information you extract from the logfiles.

=head2 Reading the files

The package can be customized by subclassing C<Logfile>.

A subclass should provide a function C<next> which reads the next
record from the file handle C<$self-E<gt>{Fh}> and returns an object of
type C<Logfile::Record>. In addition a function C<norm> may be
specified to normalize the various record fields.

Here is a shortened version of the C<Logfile::Cern> class:

  package Logfile::Cern;
  @ISA = qw ( Logfile::Base ) ;

  sub next {
      my $self = shift;
      my $fh = $self->{Fh};

      *S = $fh;
      my ($line,$host,$user,$pass,$rest,$date,$req,$code,$bytes);

      ($host,$user,$pass,$rest) = split ' ', $line, 4;
      ($rest =~ s!\[([^\]]+)\]\s*!!) && ($date = $1);
      ($rest =~ s!\"([^\"]+)\"\s*!!) && ($req = (split ' ', $1)[1]);
      ($code, $bytes) = split ' ', $rest;
      Logfile::Record->new(Host  => $host,
                           Date  => $date,
                           File  => $req,
                           Bytes => $bytes);

As stated above, in general you are free to choose the fields you
enter in the record. But:

=over 5

=item B<Date>

should be a valid date string. For conversion to the seconds elapsed
since the start of epoch the modules F<GetDate> and F<Date::DateParse>
are tried. If both cannot be C<use>ed, a crude build-in module is

The record constructor replaces B<Date> by the date in C<yymmdd>
form to make it sortable. Also the field B<Hour> is padded in.

=item B<Host>

Setting B<Host> will also set field B<Domain> by the verbose name of
the country given by the domain suffix of the fully qualified
domain name (hostname.domain). C<foo.bar.PG> will be mapped to C<Papua
New>. Host names containing no dot will be assigned to the domain
B<Local>. IP numbers will be assigned to the domain
B<Unresolved>. Mapping of short to long domain names is done in the
B<Net::Country> extension which might be useful in other contexts:

  use Net::Country;
  $germany = Net::Country::Name('de');

=item B<Records>

is always set to 1 in the C<Record> constructor. So this field gives
the number of successful returns from the C<next> function.


Here is the shortened optional C<norm> method:

  sub norm {
      my ($self, $key, $val) = @_;

      if ($key eq File) {
          $val =~ s/\?.*//;                             # remove query
          $val =~ s!%([\da-f][\da-f])!chr(hex($1))!eig; # decode escapes

The constructor reads in a logfile and builds one or more indices.

  $l = new Logfile::Cern  File => 'cache.log.gz', 
                          Group => [Host,Domain,File,Hour,Date];

There is little space but some time overhead in generating additional
indexes. If the B<File> parameter is not given, B<STDIN> is used. The
Group parameter may be a field name or a reference to a list of field
names. Only the field names given as constructor argument can be used
for report generation.

=head2 Report Generation

The Index to use for a report must be given as the B<Group>
parameter. Output is sorted by the index field unless a B<Sort>
parameter is given. Also the output can be truncated by a B<Top>
argument or B<Limit>.

The report generator lists the fields B<Bytes> and B<Records> for a
given index. The option B<List> may be a single field name or a
reference to an array of field names. It specifies which field should
be listed in addition to the B<Group> field. B<List> defaults to

  $l->report(Group => Domain, List => [Bytes, Records])

Output is sorted by the B<Group> field unless overwritten by a B<Sort>
option. Default sorting order is increasing for B<Date> and B<Hour>
fields and decreasing for all other Fields. The order can be reversed
using the B<Reverse> option.

This code

  $l->report(Group => File, Sort => Records, Top => 10);


  File                          Records 
  /htbin/SFgate               30 31.58% 
  /freeWAIS-sf/*              22 23.16% 
  /SFgate/SFgate               8  8.42% 
  /SFgate/SFgate-small         7  7.37% 
  /icons/*                     4  4.21% 
  /~goevert                    3  3.16% 
  /journals/SIGMOD             3  3.16% 
  /SFgate/ciw                  2  2.11% 
  /search                      1  1.05% 
  /reports/96/                 1  1.05% 

Here are other examples. Also take a look at the F<t/*> files.

  $l->report(Group => Domain, Sort => Bytes);

  Domain                  Records 
  Germany               12 12.63% 
  Unresolved             8  8.42% 
  Israel                34 35.79% 
  Denmark                4  4.21% 
  Canada                 3  3.16% 
  Network                6  6.32% 
  US Commercial         14 14.74% 
  US Educational         8  8.42% 
  Hong Kong              2  2.11% 
  Sweden                 2  2.11% 
  Non-Profit             1  1.05% 
  Local                  1  1.05% 
  $l->report(Group => Hour, List => [Bytes, Records]);

  Hour            Bytes          Records 
  07      245093 17.66%        34 35.79% 
  08      438280 31.59%        19 20.00% 
  09      156730 11.30%        11 11.58% 
  10      255451 18.41%        16 16.84% 
  11      274521 19.79%        10 10.53% 
  12       17396  1.25%         5  5.26% 

=head2 Report options

=over 5

=item B<Group> C<=E<gt>> I<field>

Mandatory. I<field> must be one of the fields passed to the constructor.

=item B<List> C<=E<gt>> I<field>

=item B<List> C<=E<gt>> [I<field>, I<field>]

List the subtotals for I<field>s. Defaults to B<Records>.

=item B<Sort> C<=E<gt>> I<field>.

Sort output by I<field>. By default, B<Date> and B<Hour> are sorted in increasing order, whereas all
  other fields are sorted in decreasing order.

=item B<Reverse> C<=E<gt> 1> 

Reverse sorting order.

=item B<Top> C<=E<gt>> I<number>

Print only the first I<number> subtotals.

=item B<Limit> C<=E<gt>> I<number>

Print only the subtotals with B<Sort> field greater than I<number>
(less than number if sorted in increasing order).


Currently reports are simply printed to STDOUT.

=head1 AUTHOR

Ulrich Pfeifer E<lt>F<pfeifer@wait.de>E<gt>

=head1 NEWS

Fixed strict refs bug for perl 5.005.

Fixed bug in fallback to included date parsing reported by James

Fixed y2k bug as suggested by Fred Korz.  I chose the two digit
version to be a backward compatible as possible.  The C<20> will be
obvious a few years from now ;-) Output columns should now be
separated by whitespace in any case.

=head1 SEE ALSO