The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

App::ElasticSearch::Utilities::Query - Object representing ES Queries

VERSION

version 7.9

ATTRIBUTES

fields_meta

A hash reference with the field data from App::ElasticSearch::Utilities::es_index_fields.

query_stash

Hash reference containing replaceable query elements. See stash.

scroll_id

The scroll id for the last executed query. You shouldn't mess with this directly. It's best to use the execute() and scroll_results() methods.

must

The must section of a bool query as an array reference. See: add_bool Can be set using set_must and is a valid init_arg.

must_not

The must_not section of a bool query as an array reference. See: add_bool Can be set using set_must_not and is a valid init_arg.

should

The should section of a bool query as an array reference. See: add_bool Can be set using set_should and is a valid init_arg.

minimum_should_match

A string defining the minimum number of should conditions to qualify a match. See https://www.elastic.co/guide/en/elasticsearch/reference/7.3/query-dsl-minimum-should-match.html

filter

The filter section of a bool query as an array reference. See: add_bool Can be set using set_filter and is a valid init_arg.

nested

The nested query, this shortcircuits the rest of the query due to restrictions on the nested queries.

nested_path

The path by being nested, only used in nested queries.

from

Integer representing the offset the query should start returning documents from. The default is undefined, which falls back on the Elasticsearch default of 0, or from the beginning. Can be set with set_from. Cannot be an init_arg.

size

The number of documents to return in the query. The default size is 50. Can be set with set_size. Cannot be an init_arg.

fields

An array reference containing the names of the fields to retrieve with the query. The default is undefined, which falls back on the Elasticsearch default of empty, or no fields retrieved. The _source is still retrieved. Can be set with set_fields. Cannot be an init_arg.

sort

An array reference of sorting keys/directions. The default is undefined, which falls back on the Elasticsearch default of score:desc. Can be set with set_sort. Cannot be an init_arg.

aggregations

A hash reference of aggergations to perform. The default is undefined, which means do not perform any aggregations. Can be set with set_aggregations, which is aliased as set_aggs. Cannot be an init_arg. Aliased as aggs.

scroll

An ElasticSearch time constant. The default is undefined, which means scroll will not be set on a query. Can be set with set_scroll. Cannot be an init_arg. See also: set_scan_scroll.

timeout

An ElasticSearch time constant. The default is undefined, which means it will default to the connection timeout. Can be set with set_timeout. Cannot be an init_arg.

terminate_after

The number of documents to cancel the search after. This generally shouldn't be used except for large queries where you are protecting against OOM Errors. The size attribute is more accurate as it's truncation occurs after the reduce operation, where terminate_after occurs during the map phase of the query. Can be set with set_terminateafter. Cannot be an init_arg.

track_total_hits

Should the query attempt to calculate the number of hits the query would match. Defaults to true.

track_scores

Set to true to score every hit in the search results, set to false to not report scores. Defaults to unset, i.e., use the ElasticSearch default.

rest_total_hits_as_int

In ElasticSearch 7.0, the total hits element became a hash reference with more details. Since most of the tooling relies on the old behavior, this defaults to true.

search_type

Choose an execution path for the query. This is null by default, but you can set it to a valid `search_type` setting, see: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#request-body-search-search-type

METHODS

as_search( [ 'index1', 'index2' ] )

Returns a list of parameters to pass directly to es_request().

execute( [ $index1, $index2 ] )

Uses `es_request()` to return the result, stores any relevant scroll data.

scroll_results()

If a scroll has been set, this will construct and run the requisite scroll search, otherwise it returns undef.

uri_params()

Retrieves the URI parameters for the query as a hash reference. Undefined parameters will not be represented in the hash.

request_body()

Builds and returns a hash reference representing the request body for the Elasticsearch query. Undefined elements will not be represented in the hash.

query()

Builds and returns a hash reference represnting the bool query section of the request body. This function is called by the request_body function but is useful and distinct enough to expose as it's own method. Undefined elements of the query will not be represented in the hash it returns.

add_aggregations( name => { ... } )

Takes one or more key-value pairs. The key is the name of the aggregation. The value being the hash reference representation of the aggregation itself. It will silently replace a previously named aggregation with the most recent call.

Calling this function overrides the size element to 0 and disables scroll.

Aliased as add_aggs.

wrap_aggregations( name => { ... } )

Use this to wrap an aggregation in another aggregation. For example:

    $q->add_aggregations(ip => { terms => { field => src_ip } });

Creates:

    {
        "aggs": {
            "ip": {
                "terms": {
                    "field": "src_ip"
                }
            }
        }
    }

Would give you the top IP for the whole query set. To wrap that aggregation to get top IPs per hour, you could:

    $q->wrap_aggregations( hourly => { date_histogram => { field => 'timestamp', interval => '1h' } } );

Which translates the query into:

    {
        "aggs": {
            "hourly": {
                "date_histogram": {
                    "field": "timestamp",
                    "interval": "1h"
                }
                "aggs": {
                    "ip": {
                        "terms": {
                            "field": "src_ip"
                        }
                    }
                }
            }
        }
    }

aggregations_by( [asc | desc] => aggregation_string )

Applies a sort to all aggregations at the current level based on the aggregation string.

Aggregation strings are parsed with the App::ElasticSearch::Utilities::Aggregations expand_aggregate_string() functions.

Examples:

    $q->aggregations_by( desc => [ qw( sum:bytes ) ] );
    $q->aggregations_by( desc => [ qw( sum:bytes cardinality:user_agent ) ] );

set_scan_scroll($ctxt_life)

This function emulates the old scan scroll feature in early version of Elasticsearch. It takes an optional ElasticSearch time constant, but defaults to '1m'. It is the same as calling:

    $self->set_sort( [qw(_doc)] );
    $self->set_scroll( $ctxt_life );

set_match_all()

This method clears all filters and query elements to and sets the must to match_all. It will not reset other parameters like size, sort, and aggregations.

add_bool( section => conditions .. )

Appends a search condition to a section in the query body. Valid query body points are: must, must_not, should, and filter.

    $q->add_bool( must => { term => { http_status => 200 } } );

    # or

    $q->add_bool(
        must => [
            { term => { http_method => 'GET' } }
            { term => { client_ip   => '10.10.10.1' } }
        ]
        must_not => { term => { http_status => 400 } },
    );

stash( section => condition )

Allows a replaceable query element to exist in the query body sections: must, must_not, should, and/or filter. This is useful for moving through a data-set preserving everthing in a query except one piece that shifts. Imagine:

    my $query = App::ElasticSearch::Utilities::Query->new();
    $query->add_bool(must => { terms => {src_ip => [qw(1.2.3.4)]} });
    $query->add_bool(must => { range => { attack_score => { gt => 10 }} });

    while( 1 ) {
        $query->stash( must => { range => { timestamp => { gt => time() } } } );
        my @results = make_es_request( $query->request_body, $query->uri_params );

        # Long processing
    }

This allows re-use of the query object inside of loops like this.

AUTHOR

Brad Lhotsky <brad@divisionbyzero.net>

COPYRIGHT AND LICENSE

This software is Copyright (c) 2021 by Brad Lhotsky.

This is free software, licensed under:

  The (three-clause) BSD License