The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

MySQL::Log::ParseFilter - Parse and filter MySQL slow, general and binary logs

SYNOPSIS

    use MySQL::Log::ParseFilter;

    # Parse all unique queries from logs given on command line

    %params = (logs    => \@ARGV,
               queries => \%queries);

    parse_slow_logs(%params);
    parse_general_logs(%params);
    parse_binary_logs(%params);

    calc_final_values(%params);

DESCRIPTION

MySQL::Log::ParseFilter is a Perl module for parsing and filtering MySQL slow, general and binary logs. MySQL::Log::ParseFilter also parses and filters user-defined logs: logs with variable headers and SQL statement meta-properties.

Each MySQL log is formatted differently and poses many problems to accurate parsing. From the range of MySQL server versions to the vast extent of SQL syntax, parsing a MySQL log file is rarely a trivial task if done well.

MySQL::Log::ParseFilter handles all the heavy log chopping, hacking and filtering, allowing a script to simply extract the data that it wants.

Functions

The following functions are exported by default:

set_meta_filter($filter)

Set meta-property filter to $filter. $filter is a scalar containing a single string of meta-property filter conditions. See "Meta-Property".

Returns nothing.

set_statement_filter($filter)

Set SQL statement filter to $filter. $filter is a scalar containing a single string of allowed or disallowed SQL statement types. See "SQL Statement".

Returns nothing.

set_grep($pattern)

Set the grep pattern against which SQL statements must match. (This could be called the "grep filter.") $pattern is a scalar containing a single Perl regex pattern without m// or similar. For example: "^SELECT foo FROM (?:this|that)".

Returns nothing.

parse_binary_logs(%params)

Parse output of mysqlbinlog.

%params may contain the following key/values:

logs => ref to array having log file names to parse (REQUIRED)

queries => ref to hash in which to save unique queries (REQUIRED)

all_queries => ref to array in which to save all queries (optional)

NOTE: MySQL binary logs are, as the name suggests, binary--they are not text files. It is necessary to first "decode" a binary log with the MySQL-provided program mysqlbinlog. The log files given to parse_binary_logs() must be the text output from mysqlbinlog ran first on the binary log files (without the --short-form option).

Returns nothing.

parse_general_logs(%params)

Parse MySQL general logs.

%params may contain the following key/values:

logs => ref to array having log file names to parse (REQUIRED)

queries => ref to hash in which to save unique queries (REQUIRED)

all_queries => ref to array in which to save all queries (optional)

users => ref to hash in which to save unique users (optional)

Returns nothing.

parse_slow_logs(%params)

Parse MySQL slow and microslow logs.

%params may contain the following key/values:

logs => ref to array having log file names to parse (REQUIRED)

queries => ref to hash in which to save unique queries (REQUIRED)

all_queries => ref to array in which to save all queries (optional)

users => ref to hash in which to save unique users (optional)

microslow => 0 = regular slow log, 1 = microslow log (optional)

0 is default for microslow.

Returns nothing.

set_udl_format($format_file)

Set the user-defined log format defined in $format_file. $format_file is a scalar containing a single file name. See "USER-DEFINED LOGS".

This function should be called before calling parse_udl_logs().

Returns nothing.

parse_udl_logs(%params)

Parse user-defined logs.

%params may contain the following key/values:

logs => ref to array having log file names to parse (REQUIRED)

queries => ref to hash in which to save unique queries (REQUIRED)

all_queries => ref to array in which to save all queries (optional)

users => ref to hash in which to save unique users (optional)

Returns nothing.

See "USER-DEFINED LOGS".

calc_final_values(%params, $grand_totals)

Calculate final meta-property and grand total values after parsing logs. This function should only be called after calling one of the parse_ functions.

This function calculates: number of unique queries, c_sum_p, averages, grand total sums, percent true for true/false (or yes/no) meta-properties, per-meta-property percentages of grand total sums, per-user percentages of all users, and number of unique users.

%params is the same hashed passed earlier to one of the parse_ functions.

$grand_totals is a reference to a hash in which to save the grand total sums. Pass 0 or undef if you do not want grand total sums.

Returns total number of queries.

apply_final_meta_filters(%params, $total_queries)

Apply final meta-propertry filters for min, max, average, percent and sum meta-property values. This function should be called after calling calc_final_values().

%params is the same hashed passed earlier to calc_final_values() or to one of the parse_ functions if you did not calculate final values for some reason.

$total_queries is a reference to a scalar having the total number of queries. Usually, this is obtained from the return value of calc_final_values(). $total_queries will be adjusted to account for queries which were removed by a filter. Pass 0 or undef if you do not want an adjusted total queries.

Returns total number of queries removed.

META-PROPERTIES

Every SQL statement has many meta-properties. These are values about the SQL statement such as its execution time, how many rows it examines, the MySQL connection ID it is associated with, etc. It is by these values that a MySQL log is filtered (and usually sorted).

Every type of MySQL log provides different meta-properties. The list of all meta-properties is very long so only the basics are given here. For the full list visit http://hackmysql.com/mysqlsla_filters.

But first, it is important to understand the naming scheme that MySQL::Log::ParseFilter uses for meta-properties.

Naming Scheme

Meta-properties are either numeric or string. For strings, the naming scheme does not change: db is always just db. For numeric values, however, several additional meta-properties are created and identified by consistent extensions to the base meta-property name.

Take for example t from slow logs. In addition to this base meta-property, MySQL::Log::ParseFilter also creates: t_min, t_max, t_avg, t_sum and t_sum_p (unless set_save_meta_values() was disabled; see "OPTIONS").

These additional meta-properties are identified by their extensions: _min, _max, _avg, _sum, _sum_p. These extensions are consistent and form the naming scheme for most numeric meta-properties. (cid is a notable exception.) They tell you as well as MySQL::Log::ParseFilter what the additional meta-property value represents: the minimum, maximum, average and sum value of their base meta-property.

_sum_p means percentage that the base meta-property constitutes of the grand total sum for all those base meta-properties (if grand total sum were calculated when calling calc_final_values()).

There is another extension for true/false (yes/no) meta-properties: _t and _t_p. Currently, this type of meta-property is only found in microslow logs.

This naming scheme is very important when working with user-defined logs because it allows you to know in advance the names the of additional meta-properties that MySQL::Log::ParseFilter will create from the given bases meta-properties.

Slow Logs

c_sum : Total number of times SQL statement appears in log

host : Host name of MySQL connection

ip : IP address of MySQL connection

l : Time spent acquiring lock

l_sum : Total time spent acquiring lock

re : Number of rows examined

rs : Number of rows sent

t : Execution time

t_sum : Total execution time

user : User of MySQL connection

General Logs

c_sum : Total number of times SQL statement appears in log

cid : Connection ID of MySQL connection

host : Host name of MySQL connection

user : User of MySQL connection

Binary Logs

c_sum : Total number of times SQL statement appears in log

cid : Connection ID of MySQL connection

ext : Execution time

err : Error code (if any) caused by SQL statement

sid : Server ID of MySQL server

SETTING FILTERS

All filters are inclusive: every condition for every filter must pass for the statement to be saved.

Meta-Property

The format of $filter when calling set_meta_filter($filter) is [CONDITION],[CONDITION]... where each [CONDITION] is [meta][op][val].

[meta] is a meta-property name (listed above or from the full list at http://hackmysql.com/mysqlsla_filters). [op] is either >, <, or =. And [val] is the value against which [meta] from the log must pass according to [op]. [val] is numeric or string according to [meta]. For string values, only = is valid for [op].

SQL Statement

The format of $filter when calling set_statement_filter($filter) is [+-][TYPE],[TYPE]....

[+-] is give only once at the first start of the filter string. + means that the filter is positive: allow only the given [TYPE]s. - means that the filter is negative: remove the given [TYPE]s. [TYPE] is a SQL statement type: SELECT, UPDATE, INSERT, DO, SET, CREATE, DROP, ALTER, etc.

EXAMPLES

Parsing General Logs

    # Parse general logs given on command line extracting only SELECT queries
    # using database foo and calculate grand total sums

    my %queries;
    my %grand_totals;

    my %params = (
        logs    => \@ARGV,
        queries => \%queries,
    );

    set_meta_filter("db=foo");
    set_statement_filter("+SELECT");

    parse_general_logs(%params);

    calc_final_values(%params, \%grand_totals);

Parsing Slow Logs

    # Parse slow logs given on command line removing SET statements and
    # extracting only queries which took longer than 5 seconds to execute

    my %queries;

    my %params = (
        logs    => \@ARGV,
        queries => \%queries,
    );

    set_meta_filter("t>5");
    set_statement_filter("-SET");

    parse_slow_logs(%params);

    calc_final_values(%params, 0);

Parsing Binary Logs

    # Parse output files from mysqlbinlog given on command line extracting
    # only INSERT and UPDATE queries which account for more than 75% of all
    # INSERT and UPDATE queries extracted

    my %queries;
    my %grand_totals;

    my %params = (
        logs    => \@ARGV,
        queries => \%queries,
    );

    set_meta_filter("c_sum_p>75");
    set_statement_filter("+INSERT,UPDATE");

    parse_binary_logs(%params);

    calc_final_values(%params, \%grand_totals);

    apply_final_meta_filters(%params, 0);

Complete Mini-Script (dump_type)

    #!/usr/bin/perl -w

    use strict;
    use MySQL::Log::ParseFilter;

    my %queries;

    if(@ARGV != 2) {
       print "dump_type dumps a unique sample of all statements of TYPE from general LOG.\n";
       print "Usage: dump_type TYPE LOG\n";
       exit;
    }

    set_statement_filter("+$ARGV[0]");

    parse_general_logs( (logs => [ $ARGV[1] ], queries => \%queries) );

    foreach(keys %queries) { print "$queries{$_}->{sample}\n"; }

    exit;

OPTIONS

MySQL::Log::ParseFilter has six functions to set special options which can be imported with the :options tag (use MySQL::Log::ParseFilter qw(:DEFAULT :options)).

set_save_meta_values($val)

Save extra meta-property values. Default 1 (enabled). Can be set to 0 (disabled) which will result in only the following meta-properties being saved: sample, db, cid.

Any meta-property value not check in apply_final_meta_filters() can still be used (t, l, host, cid, etc.)

set_save_all_values($val)

Save "all values": arrays of every single value for certain meta-properties (meta_all). Default 0 (disabled). Can be set to 1 (enabled). This does not affect user-defined logs which has a seperate mechanism for saving all values (type nf).

At present, enabling this option causes the following all values to be saved: for microslow (msl) patched slow log with InnoDB values: iorops_all, iorbytes_all, iorwait_all, reclwait_all, qwait_all, pages_all; for binary logs: ext_all.

set_IN_abstraction($val)

Abstract IN () clauses further by grouping in groups of $val. Default 0 (disabled).

This is an experimental option. Normally, all IN clauses are condensed from IN (N, N, N) to IN (N3). This option furthers this abstraction by grouping the condensed IN clauses in groups of $val where $val is the "dividing line."

Example: with $val=10 IN (N3) becomes IN (N0-9). Therefore, any IN clause with 0 to 9 values will be condensed and then further abstracted to IN (N0-9). Likewise, any IN clauses with 10 to 19 values will be condensed and further abstracted to IN (10-19).

set_VALUES_abstraction($val)

Abstract VALUES () clauses further by removing the number of condensed value sets. Default 0 (disabled). Can be set to 1 (enabled).

This is an experimental option. Normally, all VALUES clauses are condensed from VALUES (NULL, 'foo'), (NULL, 'bar') to VALUES (NULL, 'S')2. This option furthers this abstractiong by removing that number of condensed value sets: 2.

Example: two queries INSERT INTO table VALUES ('S') and INSERT INTO table VALUES ('S'), ('S'), ('S') are first condensed to INSERT ... VALUES ('S')1 and INSERT ... VALUES ('S')3 then further abstracted to one single query: INSERT INTO table VALUES ('S').

set_atomic_statements($val)

Treat multi-statement groups atomically when filtering. Default 0 (disabled). Can be set to 1 (enabled).

This is an experimental option. Normally, each statement in a multi-statement group is filtered individually: only those which fail a filter are removed and those which pass are kept. With this option enabled, if any one statement in a group fails, the entire group of statements is removed.

This option does not apply to general logs because general logs never group statements.

set_db_inheritance($val)

Allow queries to inherit the last database specified in the log. Default 0 (disabled). Can be set to 1 (enabled).

Normally, the log must explicitly specify the database for each statement. Or, in the case of general logs, the current database is tracked by other means. Sometimes, however, logs only specify the database explicitly once. If this option is enabled, all statements following an explicit database specification inherit that database.

USER-DEFINED LOGS

MySQL::Log::ParseFilter can parse user-defined logs which have variable headers and meta-property values. Accomplishing this is not a trivial task. Therefore the subject is not covered here but at http://hackmysql.com/udl.

HACKS

The following four functions can be imported with the :hacks tag.

get_meta_filter()

Returns hash ref to internal meta filter hash which is structured: meta => [ op, value ].

get_statement_filter()

Returns hash ref to internal statement filter hash which is structured: type => 0. Also has pos_neg => 1 (positive) or 0 (negative).

passes_meta_filter($meta, $val, $type)

$meta is a meta-property name. $val is the log value. $type is 'n' (numeric) or 's' (string). Returns 1 on pass, 0 on fail.

passes_statement_filter($type)

$type is a SQL statement type (SELECT, CREATE, DROP, etc.), case insensitive. Returns 1 on pass, 0 on fail.

DEBUGGING

Calling MySQL::Log::ParseFilter::set_debug(1) will enable debugging and cause MySQL::Log::ParseFilter to print a flood of debugging information to STDOUT. This may be necessary if you feel a function is not working correctly because, although they do not return errors, they print debugging messages.

BUGS

There are no known bugs. Please contact me if you find one. Expect that I will ask for at least a portion of your log because that makes finding and fixing the bug easier.

AUTHOR

Daniel Nichter <perl@hackmysql.com>

http://hackmysql.com/

SEE ALSO

http://hackmysql.com/mlp

MySQL::Log::ParseFilter home page

http://hackmysql.com/udl

Document describing how to make user-defined logs

http://hackmysql.com/mysqlsla

mysqlsla uses every part of MySQL::Log::ParseFilter to analyze, sort and report data from MySQL logs. To study MySQL::Log::ParseFilter in all its glory, study mysqlsla. In fact, MySQL::Log::ParseFilter was born from mysqlsla.

http://hackmysql.com/microsecond_slow_logs

Document summarizing microsecond resolution support for MySQL slow logs

http://dev.mysql.com/doc/refman/5.0/en/slow-query-log.html

Official MySQL slow query log documentation.

http://dev.mysql.com/doc/refman/5.0/en/query-log.html

Official MySQL general query log documentation.

http://dev.mysql.com/doc/refman/5.0/en/binary-log.html

Official MySQL binary query log documentation.

http://dev.mysql.com/doc/refman/5.0/en/mysqlbinlog.html

Official MySQL documentation for mysqlbinlog

VERSION

v1.00

COPYRIGHT AND LICENSE

Copyright 2008 Daniel Nichter

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.