Why not adopt me?

This distribution is up for adoption! If you're interested then please contact the PAUSE module admins via email.

NAME

process_logs - read, manipulate and report on various log files

USAGE

 process_logs [options] -c configuration_file.yml

OPTIONS

 -c --config_file file                  Specifies the configuration file
 -a --reprocess_all                     Reprocess all files
 --reprocess_from date                  Reprocess everything after [date]
 -v --verbose                           Increase debugging output (can be repeated)
 --min_start_date date                  Force all start dates to be at least [date]
 --max_end_date date                    Force all end dates to be no more than [date]
 --priority_bias METHOD                 Choose priorty adjustment from: 'random',  'date', 'depth'
 --target_date DATE                     For priority bias date & depth, aim for [date]
 --ignore_code_dependencies, --no_code  Ignore dependencies on code

DESCRIPTION

Process logs using the Log::Parallel system.

process_logs is the driver script for processing data logs through a series of jobs specified in a configuration file.

Each job consists of a set of steps to process input files and create an output file (possibly bucketized). This very much like a map-reduce framework. The steps are:

1. Parse: The first step is to parse the input files. The input files can come from multiple places/steps and be in multiple formats. They must all be sorted on the same fields so that they can be joined together in an ordered stream.
2. Filter: As items are read in, the filter code is executed. Items are dropped unless the filter code returns a true value.
4. Group: The items that make it past the filter can optionally be grouped together so that they're passed to the next starge as an array of items.
4. Transform: The transform step consumes items and generate items. It consumes items one-by-one (or one group at a time), but it can produce zero or many items for each one it consumes. It can take events and squish them together into a session; or it can take a session and break it apart into events; or it can take sessions and produce a single aggregated result when it had consumed all the input.
5. Bucketize: As new resultant items are generated, they can be bucketized into many buckets and split across a cluster.
6. Write: The resultant items are writen in the format specified. Since the next step may run things though unix sort, the output format may need to be squished onto one line.
7. Sort: The output files get sorted according to fields defined in the resultant items.
8. Post-Sort Transform: If the writer had to encode the output for unix sort, it gets a chance to un-encode it after sorting so that it's in its desired format.

CONFIGURATION FILE

The configuration file is in YAML format and is preprocessed with Config::YAMLMacros which provides some macro directives (include and define).

It is post-processed with Config::Checker which allows for some flexibility (sloppyness) on the part of configuration writers. Single items will be automatically turned into lists when needed.

The configuration file has three several sections. The main section is the one that defines the jobs that process logs does.

The exact details of each section are described in Log::Parallel::ConfigCheck.

LICENSE

This package may be used and redistributed under the terms of either the Artistic 2.0 or LGPL 2.1 license.

To install Log::Parallel, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Log::Parallel

CPAN shell

perl -MCPAN -e shell
install Log::Parallel

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)