The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

loggrep - quickly find relevant lines in a log searching by date

VERSION

version 0.002

SYNOPSIS

      loggrep --start <date> --end <date> [ --include <pattern> ]+ [ --exclude <pattern> ]+ <file>

DESCRIPTION

loggrep allows one to search for lines in a file that match particular patterns. In this it is like grep and ack and many other utilities. The functionality it adds is an ability to narrow the search window to those lines that fall within temporal limits. It can find these limits quickly by a variety of binary search, allowing one to search very large log files efficiently. This requires, of course, that the lines in the file (usually) have times stamps which (usually) are in sequence and parsable in a common way.

Loggrep searches for an initial temporal limit by estimating the line offset for the line sought based on the marginal timestamps of the search region and the assumption that lines are added at a roughly constant rate. This candidate line is found; then the nearest line bearing a timestamp is sought. This time is compared the to target time and the process is repeated within the new search region until either the target time is found or the search region cannot be narrowed further.

OPTIONS

Run loggrep with the --help option to see the option default values, if any.

Log

-l file, --log=file

The log file to search may be provided either as the final argument or as the value of a --log option.

Temporal Limits

All temporal limits are parsed by Date::Parse. Date::Parse will turn a temporal expression into a Unix timestamp. You can test whether it understands your temporal expressions like so:

  $ perl -MDate::Parse -E 'say str2time shift' "21/dec/93 17:05"
  756511500

It does not actually need to get the timestamp right so long as it puts them in the right sequence.

If Date::Parse fails you, see the --time option.

-s time, --start=time

The initial temporal limit.

-e time, --end=time

The final temporal limit.

-m time, --moment=time

--moment sets the initial and final temporal limits to the same time. This is useful for extracting a single log line with a known timestamp, perhaps with its context (see --context).

-d pattern, --date=pattern

The pattern used to identify timestamps in log lines.

Search Patterns

All patterns are Perl regular expressions of the idiom understood by the Perl executing loggrep. If no patterns are provided, all lines within the temporal limits are printed. If both including and excluding patterns match a line, the latter take precedence and the line is not printed. Multiple search patterns, or none, may be provided.

-n pattern, --include=pattern

Print the lines matching the given pattern.

-N string, --include-quoted=string

Print lines containing the given substring.

-v pattern, --exclude=pattern

Exclude lines matching the given pattern.

-V string, --exclude-quoted=string

Exclude lines containing the given substring.

-i, --case-insensitive

Pattern and substring matching is case-insensitive. Note that one may turn on case-insensitivity for a single pattern like so:

  -i "(?i:match me)"

Likewise, one may turn it off for a single pattern:

  -i "(?-i:match me)"

This technique will not work for substring matching.

Debugging

-w, --warn

Warn upon finding a log line with no timestamp.

--die

Throw an error upon finding a log line with no timestamp.

Context

These options facilitate understanding matches by grouping them or providing log context.

-b, --blank

Print a blank line between non-sequential matches. This is shorthand for --sep=''.

--sep=string, --separator=string

Print the given separator between non-sequential matches.

-C num, --context=num

Print up to the given number of non-matching lines before and after a match. This is equivalent to --before=num --after=num.

-B num, --before=num

Print up to the given number of non-matching lines before a match.

-A num, --after=num

Print up to the given number of non-matching lines after a match.

Overrides

These options allow one to provide alternative functionality when printing lines or parsing times. The code defined by these options is evaluated in its own package to prevent your accidentally changing the behavior of basic loggrep functionality. It provides no protection against deliberate perversity, of course, but if you can already run Perl code from the command line, why go to the trouble of doing perverse things inside loggrep?

All code is executed by default with strict mode and warnings off and a "use vX" line is injected, where X represents the major and minor version numbers of the Perl running loggrep itself. This facilitates using modern Perl features.

-t code, --time=code

Code to be used to convert a timestamp expression to a Unix timestamp. This code will see the pattern matched as the sole value in @_, and whatever it returns will be interpreted as the timestamp.

-E code, --exec=code

Code to be used to convert a matched line into something to print. The values in @_ for this code will be the raw line, the line number, and whether it was a match. The last parameter allows one to distinguish contextual lines from match lines. Whatever this code returns will be printed.

-M module, --module=module

Additional modules to be imported into the package in which user-provided code is evaluated. This option may be repeated.

Miscellaneous

-h, -?, --help

Print basic usage information and exit.

--version

Print loggrep's version number.

ACKNOWLEDGEMENTS

Thanks go to Green River for letting me spend some time on this when I needed to create a utility to search a large log file quickly.

AUTHOR

David F. Houghton <dfhoughton@gmail.com>

COPYRIGHT AND LICENSE

This software is copyright (c) 2014 by David F. Houghton.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.