NAME

Log::Statistics - near-real-time statistics from log files

VERSION

version 0.051

SYNOPSIS

    use Log::Statistics;

    my $log = Log::Statistics->new();

    # field 3 in the log contains the duration.  registering a
    # duration field causes duration information to be added to all
    # summary data.
    $log->register_field( "duration", 2 );

    # field 1 in the log contains transaction name.  add this field to
    # the list of fields for which a summary report will be generated
    $log->add_field( "transaction", 0 );

    # field 2 in the log contains the log status entry (e.g. 404).
    # don't generate a report on this field, but add it to the list of
    # defined fields.
    $log->register_field( "status", 1 );

    # collect data about transaction and status grouped together.
    # this will result in a break-down of all transactions by status.
    # note this is different than all statuses by transaction.
    $log->add_group( [ "transaction", "status" ] );

    # add a regular expression to capture the year, month, day, hour,
    # and minute from the time field.
    my $time_regexp = ^(\d{4}-\d{2}-\d{2}\s\d{2}\:\d{2})
    $log->add_time_regexp( $time_regexp );

    # track overall response times per minute.  time is in field 6 in
    # the log
    $log->add_field( "time", 5 );

    # parse data in the log file
    $log->parse_text( $log_entries );

DESCRIPTION

Log::Statistics is a module for collecting summary data from a log file. For examples of what can be done with Log::Statistics, see the code and documentation in scripts/logstatsd. logstatsd contains a prototype implementation of several features which will eventually be migrated from scripts/logstatsd.

The basic usage is to begin by creating a new Log::Statistics object. Next, register each field name that you want to collect data about, indicating which column that data is in. Next, add fields or groups of fields for which you wish to collect statistics. Finally, use parse_text to add multiple entries or parse_line to a single entry.

This module is alpha quality code, and is still under development. A number of the features currently implemented in logstatsd will eventually find their way back here.

SUBROUTINES/METHODS

$log->new()

Create a new Log::Statistics object.

$log->register_field( $name, $column )

Define a field in the log, and indicate the column in which the field exists. Once a field has been registered, it can be used again later with add_group or add_field without having to re-specify the column number.

Registering a field does not automatically include the field in the report, except for the duration field. When a duration field has been defined, all data collected will contain information about durations.

$log->add_field( $column, $name, [ $threshold1, $threshold2, ... ] )

Collect summary data about the specified field. The column can be undef if the field has previously been registered using register_field().

For each field added to the report, summary data will be collected for each unique entry in the field. So for example, if a transaction field is added, then summary data will be collected about each unique transaction found in the log (e.g. the number of hits, total response times, etc).

Thresholds will only be honored if a duration field has been defined in the log (see THRESHOLDS below).

$log->add_group( [ $field1, $field, ...], [$threshold1, $threshold2, ... ]

Collect summary data about two or more fields grouped together. The columns must have previously been defined either by using add_field or else register_field.

For each group added to the report summary data will be collected for each unique combination of entries in the fields. For example, if a group is defined with "transaction" and "status", then summary information will be collected about each transaction broken down by the transaction status.

Note that a group for "transaction","status" is slightly different from "status","transaction". The former builds a data structure for each transaction that contains a hash with the summary data for each status. The latter builds a data structure for each status that contains a hash with the summary data for each transaction. Dumping the two data structure to xml using XML::Simple will result in different output. For more readable output, it is generally recommended that you use the field which has the least number of possible unique values first.

Thresholds will only be honored if a duration field has been defined in the log (see THRESHOLDS below).

$log->add_time_regexp( $regexp )

Define a regular expression which can be used to parse the time field. The regular expression should capture time to the resolution at which data should be collected. If you are parsing a log with many days data, you may want to generate a report which summarized by each day. On the other hand, if your log contains many transactions over a short time period, you might want to break down the summary by activity per second.

$log->add_line_regexp( $regexp )

Define a regular expression which can be used to parse the entire log entry and divide it up into a series of fields. This only needs to be defined if the entries are not single-line comma delimited.

$log->parse_text( $text )

Generate summary data about the log entries contained in $text.

If no fields or groups have been defined, only overall total data will be collected.

$log->parse_line( $line )

Similar to parse_text, except that only a single log entry is passed.

$log->add_filter_regexp( $regexp )

Add a regular expression filter. Any log entries that do not match the specified regular expression will not be processed.

$log->save_data( $file )

Save the data collected to the specified file. Data will be stored in the YAML format.

$log->read_data( $file )

Load the data collected from the specified store file. Data can been stored using save_data.

$log->get_utime_from_string( $string )

Given a plain text date string from a log, convert it to unix time. Memoized to reduce the overhead of using Date::Manip.

$log->get_xml()

Get XML report for log entries that have been processed.

$log->set_debug_nullvalues()

When this flag is set, any log entries containing a null value in any tracked fields will be printed to stderr.

Example XML Output

Here are some examples of the XML generated by Log::Statistics:

    # time field and duration field defined

    <?xml version="1.0" standalone="yes"?>
    <log-statistics>
      <fields name="time">
        <time name="2006/01/05 00:01" count="7" duration="1039" />
        <time name="2006/01/05 00:02" count="1" duration="129" />
        <time name="2006/01/05 00:03" count="7" duration="991" />
        <time name="2006/01/05 00:04" count="11" duration="1457" />
        <time name="2006/01/05 00:05" count="9" duration="2507" />
        <time name="2006/01/05 00:06" count="7" duration="1059" />
        <time name="2006/01/05 00:07" count="7" duration="1100" />
      </fields>
    </log-statistics>

    # group of status:transaction

    <?xml version="1.0" standalone="yes"?>
    <log-statistics>
      <xrefs name="status-transaction">
        <status name="BAD">
          <transaction name="mytrans1" count="3" duration="9589" />
        </status>
        <status name="GOOD">
          <transaction name="mytrans1" count="200" duration="880" />
          <transaction name="mytrans2" count="122" duration="187" />
        </status>
      </xrefs>
    </log-statistics>

THRESHOLDS

Thresholds allow monitoring the number of long response times. For example, a given transaction might be expected to be complete within 5 seconds. In addition to measuring the average response time of the transaction, you may also wish to measure how many transactions are not completed within 5 seconds. You may define any number of thresholds, so you could measure those that you consider to be fast (under 3 seconds), good (under 5 seconds), slow (over 10 seconds), and very slow (over 20 seoncds).

NOTE: If a duration field was not defined, then response times thresholds statistics can not be calculated.

DEPENDENCIES

YAML - back end storage for log summary data

Date::Manip - for converting log times to unix time.

SEE ALSO

http://www.geekfarm.org/wu/muse/LogStatistics.html

http://en.wikipedia.org/wiki/Pivot_table

http://en.wikipedia.org/wiki/Crosstab

BUGS AND LIMITATIONS

Specifying a duplicate field or group definition will cause all values for the duplicated group(s) to be counted twice.

Please report problems to VVu@geekfarm.org

Patches are welcome.

AUTHOR

VVu@geekfarm.org

LICENCE AND COPYRIGHT

Copyright (c) 2005, VVu@geekfarm.org All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

- Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

- Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

- Neither the name of geekfarm.org nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.