The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

BenchmarkAnything::Storage::Backend::SQL - Autonomous SQL backend to store benchmarks

SYNOPSIS

    require YAML::Syck;
    require BenchmarkAnything::Storage::Backend::SQL;
    my $or_bench = BenchmarkAnything::Storage::Backend::SQL->new({
        dbh    => $or_dbh,
        debug  => 0,
        config => YAML::Syck::LoadFile('~/conf/tapper_benchmark.conf'),
    });

    my $b_success = $or_bench->add_single_benchmark({
        NAME => 'testbenchmark',
        UNIT => 'example unit',
        testplanid => 813,
        DATA => [
            {
                VALUE          => 123.45,
                testrun_id     => 123,
                machine        => 'mx1.small',
                benchmark_date => '2013-09-25 12:12:00',
            },{
                VALUE          => 122.88,
                testrun_id     => 123,
                machine        => 'mx1.large',
                benchmark_date => '2013-09-23 13:02:14',
            },
            ...
        ],
    },{
        force => 1,
    });

    my $b_success = $or_bench->add_multi_benchmark([
        {
            NAME           => 'testbenchmark',
            UNIT           => 'example unit',
            VALUE          => 123.45,
            testrun_id     => 123,
            machine        => 'mx1.small',
            benchmark_date => '2013-09-25 12:12:00',
        },{
            NAME           => 'testbenchmark',
            UNIT           => 'example unit',
            VALUE          => 122.88,
            testrun_id     => 123,
            machine        => 'mx1.large',
            benchmark_date => '2013-09-23 13:02:14',
        },
        ...
    ],{
        force => 1,
    });

    my $or_benchmark_points = $or_bench->search({
        select      => [
            'testrun_id',
            'machine',
        ],
        where       => [
            ['!=', 'machine', 'mx1.small'     ],
            ['=' , 'bench'  , 'testbenchmark' ],
        ],
        order_by    => [
            'machine',
            ['testrun_id','ASC',{ numeric => 1 }]
        ],
        limit       => 2,
        offset      => 1,
    });

    while my $hr_data_point ( $or_benchmark_points->fetchrow_hashref() ) {
        ...
    }

    my $b_success = $or_bench->subsume({
        subsume_type        => 'month',
        exclude_additionals => [qw/ benchmark_date /],
        date_from           => '2013-01-01 00:00:00',
        date_to             => '2014-01-01 00:00:00',
    });

DESCRIPTION

BenchmarkAnything::Storage::Backend::SQL is a module for adding benchmark points in a standardised way to the the database. A search function with complexe filters already exists.

Class Methods

new

Create a new BenchmarkAnything::Storage::Backend::SQL object.

    my $or_bench = BenchmarkAnything::Storage::Backend::SQL->new({
        dbh    => $or_dbh,
        debug  => 0,
        verbose=> 0,
        config => YAML::Syck::LoadFile('~/conf/tapper_benchmark.conf'),
        searchengine => ... # optional, see below at "Elasticsearch support"
    });
dbh

A DBI database handle.

config [optional]

Containing the path to the BenchmarkAnything::Storage::Backend::SQL-Configuration-File. See Configuration for details.

debug [optional]

Setting debug to a true value results in multiple debugging informations written to STDOUT. The default is 0.

verbose [optional]

Setting verbose to a true value provides more logs or status information. The default is 0.

add_single_benchmark

Add one or more data points to a single benchmark to the database.

    my $b_success = $or_bench->add_single_benchmark({
        NAME => 'testbenchmark',
        UNIT => 'example unit',
        data => [
            {
                VALUE          => 123.45,
            },{
                VALUE          => 122.88,
                testrun_id     => 123,
                machine        => 'mx1.large',
                benchmark_date => '2013-09-23 13:02:14',
            },{
                VALUE          => 122.88,
                testrun_id     => 123,
            },
            ...
        ],
    },{
        force => 1
    });
1st Parameter Hash => NAME

The name of the benchmark for grouping benchmark data points.

1st Parameter Hash => data

This parameter contains the benchmark data points. It's an array of hashes. The element VALUE is the only required element in this hashes. The VALUE is the benchmark data point value.

1st Parameter Hash => UNIT [optional]

Containing a unit for benchmark data point values.

2nd Parameter Hash => force [optional]

Ignore forgivable errors while writing.

add_multi_benchmark

Add one or more data points for multiple benchmarks to the database.

    my $b_success = $or_bench->add_multi_benchmark([
        {
            NAME           => 'testbenchmark 1',
            UNIT           => undef,
            VALUE          => 123.45,
        },{
            NAME           => 'testbenchmark 2',
            VALUE          => 122.88,
            testrun_id     => 123,
            machine        => 'mx1.large',
            benchmark_date => '2013-09-23 13:02:14',
        },{
            NAME           => 'testbenchmark 1',
            UNIT           => 'example unit',
            VALUE          => 122.88,
            testrun_id     => 123,
        },
        ...
    ],{
        force => 1
    });
1st Parameter Array of Hashes => NAME

The name of the benchmark for grouping benchmark data points.

1st Parameter Hash => VALUE

The value is the benchmark data point value.

1st Parameter Hash => UNIT [optional]

Containing a unit for benchmark data point values.

1st Parameter Hash => all others

All other elements in the hashes are additional values added to this data point.

2nd Parameter Hash => force [optional]

Ignore forgivable errors while writing.

Search for benchmark data points in the database. Function returns a DBI Statement Handle.

    my $or_benchmark_points = $or_bench->search({
        select      => [
            'testrun_id',
            'machine',
        ],
        where       => [
            ['!=', 'machine', 'mx1.small'     ],
            ['=' , 'NAME'   , 'testbenchmark' ],
        ],
        where_sql   => q#,
            AND NOT(
                   ${testrun_id} = 123
                OR ${VALUE}      = '144'
            )
        #,
        limit       => 2,
        offset      => 1,
        order_by    => [
            'machine',
            ['testrun_id','ASC']
        ],
    });
select [optional]

An Array of Strings or Array References containing additional selected columns. The default selected columns are: NAME - name of benchmark UNIT - benchmark unit [optional] VALUE - value of benchmark data point VALUE_ID - unique benchmark data point identifier CREATED - benchmark data point created date in format YYYY-MM-DD HH:II:SS

Add additional data "testrun_id" and "machine" as columns to selection.

    ...
        select      => [
            'testrun_id',
            'machine',
        ],
    ...

Do the same as above.

    ...
        select      => [
            ['','testrun_id'],
            ['','machine'],
        ],
    ...

Get the maximum "testrun_id" of all selected data points. All other columns without an aggregation become the default_aggregation from BenchmarkAnything::Storage::Backend::SQL-Configuration. Possible aggregation types are:

    - min = minimum
    - max = maximum
    - avg = average
    - gem = geometric mean
    - sum = summary
    - cnt = count
    - cnd = distinct value count

    ...
        select      => [
            ['max','testrun_id'],
            'machine',
        ],
    ...

A aggregation is also possible for the default columns.

    ...
        select      => [
            ['max','testrun_id'],
            ['avg','VALUE'],
        ],
    ...
where [optional]

An Array of Array References containing restrictions for benchmark data points.

    ...
        where       => [
            ['!=', 'machine', 'mx1.small'     ],
            ['=' , 'NAME'   , 'testbenchmark' ],
        ],
    ...

1. Parameter in Sub-Array = restriction operator

    =           - equal
    !=          - not equal
    <           - lower
    >           - greater
    <=          - lower equal
    >=          - greater equal
    like        - SQL LIKE
    not_like    - SQL NOT LIKE
    is_empty    - empty string or undef or null

2. Parameter in Sub-Array = restricted column

A restriction is possible for additional values and the default columns.

3 - n. Parameters in Sub-Array = value for restriction

In general there is just a single value. For '=' and '!=' a check for multiple values is possible. In SQL it is implemented with IN and NOT IN.

where_sql [optional]

A String containing an additional where clause. Please use this feature just if the "where" parameter is not sufficient to restrict.

order_by [optional]

An Array of Strings or an Array of Array References determining the order of returned benchmark data points.

Array of Strings: column to sort with default order direction "ASC" (ascending)

Array of Array References 1. Element: column to sort 2. Element: order direction with possible values "ASC" (ascending) and "DESC" (descending) 3. Element: hash of additional options. Possible values: numeric: Set a true value for a numeric sort

    ...
        order_by    => [
            'machine',
            ['benchmark_date','DESC']
            ['testrun_id','ASC',{numeric => 1}]
        ],
    ...
limit [optional]

An integer value which determine the number of returned benchmark data points.

offset [optional]

An integer value which determine the number of omitted benchmark data points.

search_array

Returning all benchmark data points as Array of Hashes.

    my $or_benchmark_points = $or_bench->search_array({
        select      => [
            'testrun_id',
            'machine',
        ],
        where       => [
            ['!=', 'machine', 'mx1.small'     ],
            ['=' , 'NAME'   , 'testbenchmark' ],
        ],
        limit       => 2,
        offset      => 1,
        order_by    => [
            'machine',
            ['testrun_id','ASC']
        ],
    });

search_hash

Returning all benchmark data points as Hash of Hashes. As compared to search search_array this function needs the parameter keys. keys is an Array of Strings which determine the columns used as the keys for the nested hashes. Every "key" create a new nested hash.

    my $or_benchmark_points = $or_bench->search_array({
        keys        => [
            'testrun_id',
            'machine',
            'VALUE_ID',
        ],
        select      => [
            'testrun_id',
            'machine',
        ],
        where       => [
            ['!=',       'machine',      'mx1.small'     ],
            ['=',        'NAME'   ,      'testbenchmark' ],
            ['like',     'some_key',     'some%value'    ],
            ['not_like', 'another_key',  'another%value' ],
            ['is_empty', 'parameter1',   1 ], # check parameter1 is empty     - Elasticsearch backend only
            ['is_empty', 'parameter2',   0 ], # check parameter2 is not empty - Elasticsearch backend only
        ],
        limit       => 2,
        offset      => 1,
        order_by    => [
            'machine',
            ['testrun_id','ASC']
        ],
    });

get_stats

Returns a hash with info about the storage, like how many data points, how many metrics, how many additional keys, are stored.

 my $stats = $or_bench->get_stats();

get_single_benchmark_point

Get a single data point from the database including all essential fields (NAME, VALUE, UNIT, VALUE_ID, CREATED) and all additional fields.

 my $point = $or_bench->get_single_benchmark_point($value_id);

get_full_benchmark_points

Get $count data points from the database including all essential fields (NAME, VALUE, UNIT, VALUE_ID, CREATED) and all additional fields, beginning with $value_id.

 my $point = $or_bench->get_full_benchmark_points($value_id, $count);

list_benchmark_names

Get a list of all benchmark NAMEs, optionally matching a given pattern (SQL LIKE syntax, i.e., using % as placeholder.

 $benchmarkanythingdata = $or_bench->list_benchmark_names($pattern);

list_additional_keys

Get a list of all additional key names, optionally matching a given pattern (SQL LIKE syntax, i.e., using % as placeholder.

 $benchmarkanythingdata = $or_bench->list_additional_keys($pattern);

enqueue_multi_benchmark

As a low-latency alternative to directly calling "add_multi_benchmark" there is a queuing functionality.

The enqueue_multi_benchmark function simply writes the raw incoming data structure serialized (and compressed) into a single row and returns. The complementary function to this is process_queued_multi_benchmark which takes these values over using the real add_multi_benchmark internally.

process_queued_multi_benchmark

This is part 2 of the low-latency queuing alternative to directly calling "add_multi_benchmark".

It transactionally marks a single raw entry as being processed and then takes over its values by calling add_multi_benchmark. It preserves the order of entries by inserting each chunk sequentially, to not confuse the IDs to the careful reader. After the bundle is taken over it is marked as processed.

This function only handles one single raw entry. It is expected to called from co-operating multiple worker tasks or multiple times from a wrapper.

Currently the original raw values are not deleted immediately, just for safety reasons, until the transactional code is death-proof (and certified by Stuntman Mike). There is a dedicated funtion L/gc> for that cleanup.

The function returns the ID of the processed raw entry.

gc

This calls garbage collection, in particular deletes raw entries created by process_queued_multi_benchmark and already processed by process_queued_multi_benchmark.

It is separated from those processing just for safety reasons until the transactional code in there is waterproof.

The gc function can cleanup more stuff in the future.

subsume

This is a maintenance function for reducing the number of data points in the database. Calling this function reduces the rows in the benchmark values table by building an average value for all benchmark data points grouped by specfic columns. By default all old grouped columns will be added to backup tables for rebuilding the original state. It is highly recommended to do this periodically for better search performance.

    my $b_success = $or_bench->subsume({
        subsume_type        => 'month',
        exclude_additionals => [qw/ benchmark_date /],
        date_from           => '2013-01-01 00:00:00',
        date_to             => '2014-01-01 00:00:00',
        backup              => 0,
    });
subsume_type

The subsume of benchmark data points is made by group with the following elements:

 - bench_id
 - additional data values ( Example: testrun_id, machine )
 - specific data range ( subsume_type ).
   The possible subsume types are stored in the
   extrapolation_type_table ( BenchmarkAnything::Storage::Backend::SQL-Configuration ). By default there
   are the following types: "second", "minute", "hour", "day", "week", "month",
   "year".
date_from

Begin of subsume period.

date_to

End of subsume period.

exclude_additionals

Array Reference of additional values that should be excluded from grouping.

backup

By default all subsumed rows will be inserted to backup tables. If this isn't desired a false value must be passed.

init_search_engine( $force )

Initializes the configured search engine (Elasticsearch). If the index already exists it does nothing, except when you set $force to a true value which deletes and re-creates the index. This is necessary for example to apply new type mappings.

After a successful (re-)init you need to run sync_search_engine.

During (re-init) and sync you should disable querying by setting

  searchengine.elasticsearch.enable_query: 0

sync_search_engine( $force, $start, $count)

Sync $count (default 10000) entries from the relational backend into the search engine (Elasticsearch) for indexing, beginning at $start (default 1). Already existing entries in Elasticsearch are skipped unless $force is set to a true value.

NAME

BenchmarkAnything::Storage::Backend::SQL - Save and search benchmark points by database

Configuration

The following elements are required in configuration:

default_aggregation

Default aggregation used for non aggregated columns if an aggregation on any other column is found.

tables

Containing the names of the tables used bei BenchmarkAnything::Storage::Backend::SQL

    tables => {
        unit_table                       => 'bench_units',
        benchmark_table                  => 'benchs',
        benchmark_value_table            => 'bench_values',
        subsume_type_table               => 'bench_subsume_types',
        benchmark_backup_value_table     => 'bench_backup_values',
        additional_type_table            => 'bench_additional_types',
        additional_value_table           => 'bench_additional_values',
        additional_relation_table        => 'bench_additional_relations',
        additional_type_relation_table   => 'bench_additional_type_relations',
        backup_additional_relation_table => 'bench_backup_additional_relations',
    }
select_cache [optional]

In case of a true value the module cache some select results

default_columns

Returns the hash about those columns that are by default part of each single data point (NAME, UNIT, VALU, VALUE_ID, CREATED, each with its internal column name). These default columns might go away in the future, but for now some systems need this internal information.

benchmark_operators

Returns the list of operators supported by the query language. This is provided for frontend systems that support creating queries automatically.

json_true

Auxiliary function for Elasticsearch JSON data.

json_false

Auxiliary function for Elasticsearch JSON data.

Elasticsearch support

Config

You can pass through a config entry for an external search engine (currently only Elasticsearch) to the constructor:

    my $or_bench = BenchmarkAnything::Storage::Backend::SQL->new({
        dbh    => $or_dbh,
        searchengine => {
          elasticsearch =>
            #
            # which index/type to use
            index => "myapp",
            type  => "benchmarkanything",
            #
            # queries use the searchengine
            enable_query => 1,
            #
            # should each single added value be stored immediately
            # (maybe there is another bulk sync mechanism)
            index_single_added_values_immediately => 1,
            #
            # which nodes to use
            nodes => [ 'localhost:9200' ],
            #
            # (OPTIONAL)
            # Your additional application specific mappings, used when
            # index is created.
            #
            # WARNING: You are allowed to overwrite the pre-defined
            # defaults for the built-in fields (NAME, VALUE, VALUE_ID,
            # UNIT, CREATED) as it is a RIGHT_PRECEDENT hash merge with
            # your additional_mappings on the right side. So if you
            # touch the internal fields you better know what you are
            # doing.
            additional_mappings => {
                # type as defined above in elasticsearch.type
                benchmarkanything => {
                    # static key <properties>
                    properties => {
                        # field
                        tapper_report => {
                            type => long,
                        },
                        tapper_testrun => {
                            type => long,
                        },
                        tapper_testplan => {
                            type => long,
                        },
                    },
                },
            },
        },
    });

With such a config and an already set up Elasticsearch you can use the lib as usual but it is handling all things transparently behind the scenes to index and query the data with Elasticsearch. The relational SQL storage is still used as the primary storage.

Index

When index_single_added_values_immediately is set, every single added entry is fetched right after insert (to get all transformations and added metadata) and sent to elasticsearch for index.

Please note, this immediate indexing adds an overhead to insert time. You could as well switch-off this setting and take care of indexing the data at another time. Then again, for instance the ::Frontend::HTTP already takes care of bulk-adding new data asynchronously, so the overhead should be hidden in there, so just switch-on the feature and don't worry too much.

When enable_query is set, the BenchmarkAnything queries are transformed into corresponding Elasticsearch queries, sent to Elastisearch, and the result is taken directly from its answers.

AUTHOR

Roberto Schaefer <schaefr@amazon.com>

COPYRIGHT AND LICENSE

This software is Copyright (c) 2018 by Amazon.com, Inc. or its affiliates.

This is free software, licensed under:

  The (two-clause) FreeBSD License