The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

files-to-elasticsearch.pl - A simple utility to tail a file and index each line as a document in ElasticSearch

VERSION

version 0.010

SYNOPSIS

To see available options, run:

    file-to-elasticsearch.pl --help

Create a config file and run the utility:

    file-to-elasticsearch.pl --config config.yaml --log4perl logging.conf --debug

This will run a single threaded POE instance that will tail the log files you've requested, performing the requested transformations and sending them to the elasticsearch cluster and index you've specified.

CONFIGURATION

Configuration

ElasticSearch Settings

The elasticsearch section of the config controls the settings passed to the POE::Component::ElasticSearch::Indexer.

    ---
    elasticsearch:
      servers: [ "localhost:9200" ]
      flush_interval: 30
      flush_size: 1_000
      index: logstash-%Y.%m.%d
      type: log

The settings available are:

servers

An array of servers used to send bulk data to ElasticSearch. The default is just localhost on port 9200.

flush_interval

Every flush_interval seconds, the queued documents are send to the Bulk API of the cluster.

flush_size

If this many documents is received, regardless of the time since the last flush, force a flush of the queued documents to the Bulk API.

index

A strftime compatible string to use as the DefaultIndex parameter if a file doesn't pass one along.

type

Mostly useless as Elastic is abandoning "types", but this will be set as the DefaultType for documents being indexed.

Tail Section

The files section contains the list of files to tail and the rules to use to index them.

    ---
    tail:
      - file: '/var/log/osquery/result.log'
        index: "osquery-result-%Y.%m.%d"
        decode: json
        extract:
          - by: split
            from: name
            when: '^pack'
            into: 'pack'
            split_on: '/'
            split_parts: [ null, "name", "report" ]
        mutate:
          prune: true
          remove: [ "calendarTime", "epoch", "counter", "_raw" ]
          rename:
            unixTime: _epoch

Each element is a hash containing the following information.

file

Required: The path to the file on the filesystem.

decode

This may be a single element, or an array, containing one or more of the implemented decoders.

json

Decode the discovered JSON in the document to a hash reference. This finds the first occurrence of an { in the string and assumes everything to the end of the string is JSON.

Decoding is done by JSON::MaybeXS.

syslog

Parses each line as a standard UNIX syslog message. Parsing is provided via Parse::Syslog::Line which isn't a hard requirement of the this package, but will be loaded if available.

index

A strfime compatible string to use as the index to put documents created from this file. If not specified, the defaults from the ElasticSearch section will be used, and failing that, the default as specified in POE::Component::ElasticSearch::Index.

type

The type to use for documents sourced from this file.

extract

Extraction of fields from the document by one of the supported methods.

by

AUTHOR

Brad Lhotsky <brad@divisionbyzero.net>

COPYRIGHT AND LICENSE

This software is Copyright (c) 2018 by Brad Lhotsky.

This is free software, licensed under:

  The (three-clause) BSD License