The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Net::API::CPAN::Filter - Meta CPAN API

SYNOPSIS

    use Net::API::CPAN::Filter;
    my $this = Net::API::CPAN::Filter->new(
        query => {
            regexp => { name => 'HTTP.*' },
        },
    ) || die( Net::API::CPAN::Filter->error, "\n" );

VERSION

    v0.1.0

DESCRIPTION

This class is designed to facilitate the forming of an Elastic Search query and store its various components as an object of this class, so it can possibly be re-used or shared.

You can pass arguments to the methods "aggs", "fields", "filter", "from", "match_all", "query", "size", "sort", "source" to affect the production of the query.

Alternatively, you can pass an hash reference of a fully formed Elastic Search query directly to "es" to take precedence over all the other methods.

Calling "as_hash" will collate all the components and cache the result. If any information is changed using any of the methods in this class, it will remove the cached hash produced by "as_hash"

You can get a resulting JSON by calling "as_json", which in turn, calls "as_hash"

As far as it is documented in the API documentation, Meta CPAN uses version 2.4 of Elastic Search, and the methods documentation herein reflect that.

METHODS

aggregations

This is an alias for "aggs"

aggs

Sets or gets an hash reference of query aggregations (post filter). It returns an hash object, or undef, if nothing was set.

Example from Elastic Search documentation

    {
        aggs => {
            models => {
                terms => { field => "model" },
            },
        },
        query => {
            bool => {
                filter => [
                    {
                        term => { color => "red" },
                    },
                    {
                        term => { brand => "gucci" },
                    },
                ],
            },
        },
    }

See also Elastic Search documentation, and here

apply

Provided with an hash or hash reference of parameters and this will apply each of the value to the method matching its corresponding key if that method exists.

It returns the current object for chaining.

as_hash

Read-only. Returns the various components of the query as an hash reference.

The resulting hash of data is cached so you can call it multiple time without additional overhead. Any change passed to any methods here will reset that cache.

as_json

    my $json = $filter->as_json;
    my $json_in_utf8 = $filter->as_json( encoding => 'utf-8' );

Read-only. Returns the various components of the query as JSON data encoded in Perl internal utf-8 encoding.

If an hash or hash reference of options is provided with a property encoding set to utf-8 or utf8, then the JSON data returned will be encoded in utf-8

es

This takes an hash reference of Elastic Search query parameters.

See "ELASTIC SEARCH QUERY" for a brief overview of valid parameters.

Otherwise you are encouraged to call "query" which will format the Elastic Search query for you.

Returns an hash object

fields

Sets or gets an array of fields onto which the query will be applied.

It returns an array object

    {
        query => {
            terms => { name => "Japan Folklore" }
        },
        fields => [qw( name abstract distribution )],
    }

Field names can also contain wildcard:

    {
        query => {
            terms => { name => "Japan Folklore" }
        },
        fields => [qw( name abstract dist* )],
    }

Importance of some fields can also be boosted using the caret notation ^

    {
        query => {
            terms => { name => "Japan Folklore" }
        },
        fields => [qw( name^3 abstract dist* )],
    }

Here, the field name is treated as 3 times important as the others.

See Elastic Search documentation for more information.

filter

Sets or gets an hash of filter to affect the Elastic Search query result.

    {
        query => {
            bool => {
                must => [
                    { match => { name     => "Folklore-Japan-v1.2.3"       }},
                    { match => { abstract => "Japan Folklore Object Class" }}
                ],
                filter => [
                    { term =>  { status => "latest" }}, 
                    { range => { date => { gte => "2023-07-01" }}} 
                ]
            }
        }
    }

It returns an hash object

from

Sets or gets a positive integer to return the desired results page. It returns the current value, if any, as a number object, or undef if there is no value set.

    {
        from => 0,
        query => {
            term => { user => "kimchy" },
        },
        size => 10,
    }

As per the Elastic Search documentation, "[p]agination of results can be done by using the from and size parameters. The from parameter defines the offset from the first result you want to fetch. The size parameter allows you to configure the maximum amount of hits to be returned".

For example, on a size of 10 elements per page, the first page would start at offset a.k.a from 0 and end at offset 9 and page 2 at from 10 till 19, thus to get the second page you would set the value for from to 10

See also the more efficient scroll approach to pagination of query results.

Keep in mind this is different from the from option supported in some endpoints of the MetaCPAN API, which would typically starts at 1 instead of 0.

See Elastic Search documentation for more information.

match_all

    # Enabled
    $filter->match_all(1);
    # Disabled (default)
    $filter->match_all(0);
    # or
    $filter->match_all(undef);
    # or with explicit score
    $filter->match_all(1.12);

Boolean. If true, this will match all documents by Elastic Search with an identical score of 1.0

If the value provided is a number other than 1 or 0, then it will be interpreted as an explicit score to use instead of the default 1.0

For example:

    $filter->match_all(1.12)

would produce:

    { match_all => { boost => 1.2 }}

See Elastic Search for more information.

name

Sets or gets the optional query name. It always returns a scalar object

If set, it will be added to the filter

    {
        bool => {
            filter => {
                terms => { _name => "test", "name.last" => [qw( banon kimchy )] },
            },
            should => [
                {
                    match => { "name.first" => { _name => "first", query => "shay" } },
                },
                {
                    match => { "name.last" => { _name => "last", query => "banon" } },
                },
            ],
        },
    }

See Elastic Search documentation for more information.

query

This takes an hash reference of parameters and format the query in compliance with Elastic Search. You can provide directly the Elastic Search structure by calling "es" and providing it the proper hash reference of parameters.

Queries can be straightforward such as:

    { name => 'Taro Momo' }

or

    { pauseid => 'MOMOTARO' }

or using simple regular expression:

    { name => 'Taro *' }

This would find all the people whose name start with Taro

To produce more complex search queries, you can use some special keywords: all, either and not, which correspond respectively to Elastic Search must, should, and must_not and you can use the Elastic Search keywords interchangeably if you prefer. Thus:

    {
        either => [
            { name => 'John *'  },
            { name => 'Peter *' },
        ]
    }

is the same as:

    {
        should => [
            { name => 'John *'  },
            { name => 'Peter *' },
        ]
    }

and

    {
        all => [
            { name  => 'John *'     },
            { email => '*gmail.com' },
        ]
    }

is the same as:

    {
        must => [
            { name  => 'John *'     },
            { email => '*gmail.com' },
        ]
    }

Likewise

    {
        either => [
            { name => 'John *'  },
            { name => 'Peter *' },
        ],
        not => [
            { email => '*gmail.com' },
        ],
    }

can also be expressed as:

    {
        should => [
            { name => 'John *'  },
            { name => 'Peter *' },
        ],
        must_not => [
            { email => '*gmail.com' },
        ],
    }

reset

When called with some arguments, no matter their value, this will reset the cached hash reference computed by "as_hash"

It returns the current object for chaining.

size

Sets or gets a positive integer to set the maximum number of hits of query results. It returns the current value, if any, as a number object, or undef if there is no value set.

See "from" for more information.

    {
        from => 0,
        query => {
            term => { user => "kimchy" },
        },
        size => 10,
    }

See also the Elastic Search documentation

sort

Sets or gets an array reference of sort parameter to affect the order of the query results.

It always returns an array object, which might be empty if nothing was specified.

    {
        query => {
            term => { user => "kimchy" },
        },
        sort => [
            {
                post_date => { order => "asc" },
            },
            "user",
            { name => "desc" },
            { age => "desc" },
            "_score",
        ],
    }

The order option can have the following values:

  • asc

    Sort in ascending order

  • desc

    Sort in descending order

Elastic Search supports sorting by array or multi-valued fields. The mode option controls what array value is picked for sorting the document it belongs to. The mode option can have the following values:

    {
        query => {
            term => { user => "kimchy" },
        },
        sort => [
            {
                price => {
                    order => "asc",
                    mode => "avg"
                }
            }
        ]
    }
  • min

    Pick the lowest value.

  • max

    Pick the highest value.

  • sum

    Use the sum of all values as sort value. Only applicable for number based array fields.

  • avg

    Use the average of all values as sort value. Only applicable for number based array fields.

  • median

    Use the median of all values as sort value. Only applicable for number based array fields.

You can also allow to sort by geo distance with _geo_distance, such as:

    {
        query => {
            term => { user => "kimchy" },
        },
        sort => [
            {
                _geo_distance => {
                    distance_type => "sloppy_arc",
                    mode => "min",
                    order => "asc",
                    "pin.location" => [-70, 40],
                    # or, as lat/long
                    # "pin.location" => {
                    #     lat => 40,
                    #     lon => -70
                    # },
                    # or, as string
                    # "pin.location" => "40,-70",
                    # or, as GeoHash
                    # "pin.location" => "drm3btev3e86",
                    unit => "km",
                },
            },
        ],
    }

See also Elastic Search documentation

source

This sets or gets a string or an array reference of query source filtering.

It returns the current value, which may be undef if nothing was specified.

By default Elastic Search returns the contents of the _source field unless you have used the fields parameter or if the _source field is disabled.

You can set it to false to disable it. A false value can be 0, or an empty string "", but not undef, which will disable this option entirely.

    $filter->query({
        user => 'kimchy'
    });
    $filter->source(0);

would produce the following hash returned by "as_hash":

    {
        _source => \0,
        query => {
            term => { user => "kimchy" },
        },
    }

For complete control, you can specify both include and exclude patterns:

    $filter->query({
        user => 'kimchy'
    });
    $filter->source({
        exclude => ["*.description"],
        include => ["obj1.*", "obj2.*"],
    });

would produce the following hash returned by "as_hash":

    {
        _source => { exclude => ["*.description"], include => ["obj1.*", "obj2.*"] },
        query => {
            term => { user => "kimchy" },
        },
    }

See Elastic Search documentation for more information.

ELASTIC SEARCH QUERY

Query and Filter

Example:

The following will instruct Meta CPAN Elastic Search to find module release where all the following conditions are met:

  • The name field contains the word Folklore-Japan-v1.2.3.

  • The abstract field contains Japan Folklore Object Class.

  • The status field contains the exact word latest.

  • The date field contains a date from 1 July 2023 onwards.

    {
        query => {
            bool => {
                must => [
                    { match => { name     => "Folklore-Japan-v1.2.3"       }},
                    { match => { abstract => "Japan Folklore Object Class" }}
                ],
                filter => [
                    { term =>  { status => "latest" }}, 
                    { range => { date => { gte => "2023-07-01" }}} 
                ]
            }
        }
    }

Match all

    { match_all => {} }

or with an explicit score of 1.12

    { match_all => { boost => 1.12 } }

Match Query

    {
        match => { name => "Folklore-Japan-v1.2.3" }
    }

or

    {
        match => {
            name => {
                query => "Folklore-Japan-v1.2.3",
                # Defaults to 'or'
                operator => 'and',
                # The minimum number of optional 'should' clauses to match
                minimum_should_match => 1,
                # Set to true (\1 is translated as 'true' in JSON) to ignore exceptions caused by data-type mismatches
                lenient => \1,
                # Set the fuzziness value: 0, 1, 2 or AUTO
                fuzziness => 'AUTO',
                # True by default
                fuzzy_transpositions => 1,
                # 'none' or 'all'; defaults to 'none'
                zero_terms_query => 'all',
                cutoff_frequency => 0.001,
            }
        }
    }

See also the Elastic Search documentation on match query for more information on its valid parameters.

Match Phrase

    {
        match_phrase => {
            abstract => "Japan Folklore Object Class",
        }
    }

which is the same as:

    {
        match => {
            abstract => {
                query => "Japan Folklore Object Class",
                type => 'phrase',
            }
        }
    }

Match Phrase Prefix

As per Elastic Search documentation, this is a poor-man’s autocomplete.

    {
        match_phrase_prefix => {
            abstract => "Japan Folklore O"
        }
    }

It is designed to allow expansion on the last term of the query. The maximum number of expansion is controlled with the parameter max_expansions

    {
        match_phrase_prefix => {
            abstract => {
                query => "Japan Folklore O",
                max_expansions => 10,
            }
        }
    }

The documentation recommends the use of the completion suggester instead.

Multi Match Query

This performs a query on multiple fields:

    {
        multi_match => {
            query => 'Japan Folklore',
            fields => [qw( name abstract distribution )],
        }
    }

Field names can contain wildcard:

    {
        multi_match => {
            query => 'Japan Folklore',
            fields => [qw( name abstract dist* )],
        }
    }

Importance of some fields can also be boosted using the caret notation ^

    {
        multi_match => {
            query => 'Japan Folklore',
            fields => [qw( name^3 abstract dist* )],
        }
    }

Here, the field name is treated as 3 times important as the others.

To affect the way the multiple match query is performed, you can set the type value to best_fields, most_fields, cross_fields, phrase or phrase_prefix

    {
        multi_match => {
            query => 'Japan Folklore',
            fields => [qw( name^3 abstract dist* )],
            type => 'best_fields',
        }
    }

It accepts the other same parameters as in the "Query and Filter" in match query

See Elastic Search documentation for more details.

Common Terms Query

As per Elastic Search documentation, the "common terms query is a modern alternative to stopwords which improves the precision and recall of search results (by taking stopwords into account), without sacrificing performance."

    {
        common => {
            abstract => {
                query => 'Japan Folklore',
                cutoff_frequency => 0.001,
            }
        }
    }

The number of terms which should match can be controlled with the minimum_should_match

See the Elastic Search documentation for more information.

Query String Query

This leverages the parser in order to parse the content of the query.

    {
        query_string => {
            default_field => "abstract",
            query => "this AND that OR thus",
            fields => [qw( abstract name )],
            # Default is 'OR'
            default_operator => 'AND',
            # \1 (true) or \0 (false)
            allow_leading_wildcard => \1,
            # Default to true
            lowercase_expanded_terms => \1,
            # Default to true
            enable_position_increments => \1,
            # Defaults to 50
            fuzzy_max_expansions => 10,
            # Defaults to 'AUTO'
            fuzziness => 'AUTO',
            # Defaults to 0
            fuzzy_prefix_length => 0,
            # Defaults to 0
            phrase_slop => 0,
            # Defaults to 1.0
            boost => 0,
            # Defaults to true
            analyze_wildcard => \1,
            # Defaults to false
            auto_generate_phrase_queries => \0,
            # Defaults to 10000
            max_determinized_states => 10000,
            minimum_should_match => 2,
            # Defaults to true,
            lenient => \1,
            locale => 'ROOT',
            time_zone => 'Asia/Tokyo',
        }
    }

Wildcard searches can be run on individual terms, using ? to replace a single character, and * to replace zero or more characters:

    qu?ck bro*

Regular expression can also be used:

As per the Elastic Search documentation, "regular expression patterns can be embedded in the query string by wrapping them in forward-slashes ("/")":

    name:/joh?n(ath[oa]n)/

Fuzziness, i.e., terms that are similar to, but not exactly like our search terms, can be expressed with the fuzziness operator:

    quikc~ brwn~ foks~

An edit distance can be specified:

    quikc~1
    "fox quick"~5

A range can be specified for date, numeric or string fields. Inclusive ranges are specified with square brackets [min TO max] and exclusive ranges with curly brackets {min TO max}.

All days in 2023:

    date:[2023-01-01 TO 2023-12-31]

Numbers 1..5

    count:[1 TO 5]

Tags between alpha and omega, excluding alpha and omega:

    tag:{alpha TO omega}

Numbers from 10 upwards

    count:[10 TO *]

Dates before 2023

    date:{* TO 2023-01-01}

Numbers from 1 up to but not including 5

    count:[1 TO 5}

Ranges with one side unbounded can use the following syntax:

    age:>10
    age:>=10
    age:<10
    age:<=10

    age:(>=10 AND <20)
    age:(+>=10 +<20)

But better to use a range query:

    {
        range => {
            age => {
                gte => 10,
                lte => 20,
                boost => 2.0
            }
        }
    }

Boolean operators:

    quick brown +fox -news
  • fox must be present

  • news must not be present

  • quick and brown are optional — their presence increases the relevance

Grouping

    (quick OR brown) AND fox

    status:(active OR pending) title:(full text search)^2

See the Elastic Search documentation and the query string syntax for more information.

Multi field

    {
        query_string => {
            fields => [qw( abstract name )],
            query => "this AND that"
        }
    }

is equivalent to:

{ query_string => { query => "(abstract:this OR name:this) AND (abstract:that OR name:that)" } }

"Simple wildcard can also be used to search "within" specific inner elements of the document":

    {
        query_string => {
            fields => ["metadata.*"],
            # or, even, to give 5 times more importance of sub elements of metadata
            fields => [qw( abstract metadata.*^5 )],
            query => "this AND that OR thus",
            use_dis_max => \1,
        }
    }

Field names

Field names can contain query syntax, such as:

where the status field contains latest

    status:latest

where the abstract field contains quick or brown. If you omit the OR operator the default operator will be used

    abstract:(quick OR brown)
    abstract:(quick brown)

where the author field contains the exact phrase john smith

    author:"John Smith"

where any of the fields metadata.abstract, metadata.name or metadata.date contains quick or brown (note how we need to escape the * with a backslash):

    metadata.\*:(quick brown)

where the field resources.bugtracker has no value (or is missing):

    _missing_:resources.bugtracker

where the field resources.repository has any non-null value:

    _exists_:resources.repository

Simple Query String Query

See Elastic Search documentation for more information.

Those queries will never throw an exception and discard invalid parts.

    {
        simple_query_string => {
            query => "\"fried eggs\" +(eggplant | potato) -frittata",
            analyzer => "snowball",
            fields => [qw( body^5 _all )],
            default_operator => "and",
        }
    }

Supported special characters:

  • + signifies AND operation

  • | signifies OR operation

  • - negates a single token

  • " wraps a number of tokens to signify a phrase for searching

  • * at the end of a term signifies a prefix query

  • ( and ) signify precedence

  • ~N after a word signifies edit distance (fuzziness)

  • ~N after a phrase signifies slop amount

Flags can be specified to indicate which features to enable when parsing:

    {
        simple_query_string => {
            query => "foo | bar + baz*",
            flags => "OR|AND|PREFIX",
        }
    }

The available flags are: ALL, NONE, AND, OR, NOT, PREFIX, PHRASE, PRECEDENCE, ESCAPE, WHITESPACE, FUZZY, NEAR, and SLOP

Term Queries

    {
        term => { author => "John Doe" }
    }

A boost parameter can also be used to give a term more importance:

    {
        query => {
            bool => {
                should => [
                {
                    term => {
                        status => {
                            value => "latest",
                            boost => 2.0 
                        }
                    }
                },
                {
                    term => {
                        status => "deprecated"
                    }
                }]
            }
        }
    }

See Elastic Search documentation for more information.

Terms Query

    {
        constant_score => {
            filter => {
                terms => { pauseid => [qw( momotaro kintaro )]}
            }
        }
    }

See Elastic Search documentation for more information.

Range Query

    {
        range => {
            age => {
                gte => 10,
                lte => 20,
                boost => 2.0,
            }
        }
    }

The range query accepts the following parameters:

  • gte

    Greater-than or equal to

  • gt

    Greater-than

  • lte

    Less-than or equal to

  • lt

    Less-than

  • boost

    Sets the boost value of the query, defaults to 1.0

When using range on a date, ranges can be specified using Date Math:

  • +1h

    Add one hour

  • -1d

    Subtract one day

  • /d

    Round down to the nearest day

Supported time units are: y (year), M (month), w (week), d (day), h (hour), m (minute), and s (second).

For example:

  • now+1h

    The current time plus one hour, with ms resolution.

  • now+1h+1m

    The current time plus one hour plus one minute, with ms resolution.

  • now+1h/d

    The current time plus one hour, rounded down to the nearest day.

  • 2023-01-01||+1M/d

    2023-01-01 plus one month, rounded down to the nearest day.

Date formats in range queries can be specified with the format argument:

    {
        range => {
            born => {
                gte => "01/01/2022",
                lte => "2023",
                format => "dd/MM/yyyy||yyyy"
                # With a time zone
                # alternatively: Asia/Tokyo
                time_zone => "+09:00",
            }
        }
    }

See Elastic Search documentation for more information.

Exists Query

Search for values that are non-null.

    {
        exists => { field => "author" }
    }

You can change the definition of what is null with the null_value parameter

Equivalent to the missing query:

    bool => {
        must_not => {
            exists => {
                field => "author"
            }
        }
    }

See Elastic Search documentation for more information.

Prefix Query

Search for documents that have fields containing terms with a specified prefix.

For example, the author field that contains a term starting with ta:

    {
        prefix => { author => "ta" }
    }

or, using the boost parameter:

    {
        prefix => {
            author => {
                value => "ta",
                boost => 2.0,
            }
        }
    }

See Elastic Search documentation for more information.

Wildcard Query

    {
        wildcard => { pauseid => "momo*o" }
    }

or

    {
        wildcard => {
            pauseid => {
                value => "momo*o",
                boost => 2.0,
            }
        }
    }

See Elastic Search documentation for more information.

Regexp Query

This enables the use of regular expressions syntax

    {
        regexp => {
            metadata.author => "Ta.*o"
        }
    }

or

    {
        regexp => {
            metadata.author => {
                value => "Ta.*o",
                boost => 1.2,
                flags => "INTERSECTION|COMPLEMENT|EMPTY",
            }
        }
    }

Possible flags values are: ALL (default), ANYSTRING, COMPLEMENT, EMPTY, INTERSECTION, INTERVAL, or NONE

Check the regular expression syntax

See Elastic Search documentation for more information.

Fuzzy Query

    {
        fuzzy => { pauseid => "momo" }
    }

With more advanced parameters:

    {
        fuzzy => {
            user => {
                value => "momo",
                boost => 1.0,
                fuzziness => 2,
                prefix_length => 0,
                max_expansions => 100
            }
        }
    }

With number fields:

    {
        fuzzy => {
            price => {
                value => 12,
                fuzziness => 2,
            }
        }
    }

With date fields:

    {
        fuzzy => {
            created => {
                value => "2023-07-29T12:05:07",
                fuzziness => "1d"
            }
        }
    }

See Elastic Search documentation for more information.

Constant Score Query

As per the Elastic Search documentation, this is a "query that wraps another query and simply returns a constant score equal to the query boost for every document in the filter".

    {
        constant_score => {
            filter => {
                term => { pauseid => "momotaro"}
            },
            boost => 1.2,
        }
    }

See Elastic Search documentation for more information.

Bool Query

As per the Elastic Search documentation, this is a "query that matches documents matching boolean combinations of other queries."

The occurrence types are:

  • must

    The clause (query) must appear in matching documents and will contribute to the score.

  • filter

    The clause (query) must appear in matching documents. However unlike must the score of the query will be ignored.

  • should

    The clause (query) should appear in the matching document. In a boolean query with no must or filter clauses, one or more should clauses must match a document. The minimum number of should clauses to match can be set using the minimum_should_match parameter.

  • must_not

    The clause (query) must not appear in the matching documents.

    {
        bool => {
            must => {
                term => { author => "momotaro" }
            },
            filter => {
                term => { tag => "tech" }
            },
            must_not => {
                range => {
                    age => { from => 10, to => 20 }
                }
            },
            should => [
                {
                    term => { tag => "wow" }
                },
                {
                    term => { tag => "elasticsearch" }
                }
            ],
            minimum_should_match => 1,
            boost => 1.0,
        }
    }

See Elastic Search documentation for more information.

Dis Max Query

As per the Elastic Search documentation, this is a "query that generates the union of documents produced by its subqueries".

    {
        dis_max => {
            tie_breaker => 0.7,
            boost => 1.2,
            queries => [
                {
                    term => { "age" : 34 }
                },
                {
                    term => { "age" : 35 }
                }
            ]
        }
    }

See Elastic Search documentation for more information.

Function Score Query

As per the Elastic Search documentation, the "function_score allows you to modify the score of documents that are retrieved by a query. This can be useful if, for example, a score function is computationally expensive and it is sufficient to compute the score on a filtered set of documents.

To use function_score, the user has to define a query and one or more functions, that compute a new score for each document returned by the query."

    function_score => {
        query => {},
        boost => "boost for the whole query",
        FUNCTION => {},
        boost_mode => "(multiply|replace|...)"
    }

Multiple functions can also be provided:

    function_score => {
        query => {},
        boost => "boost for the whole query",
        functions => [
            {
                filter => {},
                FUNCTION => {},
                weight => $number,
            },
            {
                FUNCTION => {},
            },
            {
                filter => {},
                weight => $number,
            }
        ],
        max_boost => $number,
        score_mode => "(multiply|max|...)",
        boost_mode => "(multiply|replace|...)",
        min_score => $number
    }

score_mode can have the following values:

  • multiply

    Scores are multiplied (default)

  • sum

    Scores are summed

  • avg

    Scores are averaged

  • first

    The first function that has a matching filter is applied

  • max

    Maximum score is used

  • min

    Minimum score is used

boost_mode can have the following values:

  • multiply

    Query score and function score is multiplied (default)

  • replace

    Only function score is used, the query score is ignored

  • sum

    Query score and function score are added

  • avg

    Average

  • max

    Max of query score and function score

  • min

    Min of query score and function score

To exclude documents that do not meet a certain score threshold the min_score parameter can be set to the desired score threshold.

See the Elastic Search documentation for the list of functions that can be used.

See Elastic Search documentation for more information.

Boosting Query

As per the Elastic Search documentation, the "boosting query can be used to effectively demote results that match a given query. Unlike the "NOT" clause in bool query, this still selects documents that contain undesirable terms, but reduces their overall score".

    {
        boosting => {
            positive => {
                term => {
                    field1 => "value1",
                },
            },
            negative => {
                term => {
                    field2 => "value2",
                },
            },
            negative_boost => 0.2,
        }
    }

See Elastic Search documentation for more information.

Indices Query

    {
        indices => {
            indices => [qw( index1 index2 )],
            query => {
                term => { tag => "wow" }
            },
            no_match_query => {
                term => { tag => "kow" }
            }
        }
    }

See Elastic Search documentation for more information.

Joining Queries

Elastic Search provides 2 types of joins that are "designed to scale horizontally": nested and has_child / has_parent

See Elastic Search documentation for more information.

Nested Query

As per the Elastic Search documentation, the "nested query allows to query nested objects / docs".

    {
        nested => {
            path => "obj1",
            score_mode => "avg",
            query => {
                bool => {
                    must => [
                        {
                            match => { "obj1.name" => "blue" }
                        },
                        {
                            range => { "obj1.count" => { gt => 5 } }
                        },
                    ]
                }
            }
        }
    }

The score_mode allows to set how inner children matching affects scoring of parent. It defaults to avg, but can be sum, min, max and none.

See Elastic Search documentation for more information.

Geo Queries

Elastic Search supports two types of geo data: geo_point and geo_shape

See Elastic Search documentation for more information.

Geo Bounding Box Query

A query allowing to filter hits based on a point location using a bounding box.

    {
        bool => {
            must => {
                match_all => {},
            },
            filter => {
                geo_bounding_box => {
                    "author.location" => {
                        top_left => {
                            lat => 40.73,
                            lon => -74.1,
                        },
                        # or, using an array reference [long, lat]
                        # top_left => [qw( -74.1 40.73 )],
                        # or, using a string "lat, long"
                        # top_left => "40.73, -74.1"
                        # or, using GeoHash:
                        # top_left => "dr5r9ydj2y73",
                        bottom_right => {
                            lat => 40.01,
                            lon => -71.12,
                        },
                        # or, using an array reference [long, lat]
                        # bottom_right => [qw( -71.12 40.01 )],
                        # or, using a string "lat, long"
                        # bottom_right => "40.01, -71.12",
                        # or, using GeoHash:
                        # bottom_right => "drj7teegpus6",
                    },
                    # Set to true to accept invalid latitude or longitude (default to false)
                    ignore_malformed => \1,
                }
            }
        }
    }

or, using vertices

    {
        bool => {
            must => {
                match_all => {},
            },
            filter => {
                geo_bounding_box => {
                    "author.location" => {
                        top => -74.1,
                        left => 40.73,
                        bottom => -71.12,
                        right => 40.01,
                    },
                    # Set to true to accept invalid latitude or longitude (default to false)
                    ignore_malformed => \1,
                }
            }
        }
    }

See Elastic Search documentation for more information.

Geo Distance Query

As per the Elastic Search documentation, this "filters documents that include only hits that exists within a specific distance from a geo point."

    {
        bool => {
            must => {
                match_all => {},
            },
            filter => {
                geo_distance => {
                    distance => "200km",
                    "author.location" => {
                        lat => 40,
                        lon => -70,
                    }
                    # or, using an array reference [long, lat]
                    # "author.location" => [qw( -70 40 )],
                    # or, using a string "lat, long"
                    # "author.location" => "40, -70",
                    # or, using GeoHash
                    # "author.location" => "drm3btev3e86",
                }
            }
        }
    }

See Elastic Search documentation for more information.

Geo Distance Range Query

As per the Elastic Search documentation, this "filters documents that exists within a range from a specific point".

    {
        bool => {
            must => {
                match_all => {}
            },
            filter => {
                geo_distance_range => {
                    from => "200km",
                    to => "400km",
                    2pin.location" : {
                        lat => 40,
                        lon => -70,
                    }
                }
            }
        }
    }

This supports the same geo point options as "Geo Distance Query"

It also "support the common parameters for range (lt, lte, gt, gte, from, to, include_upper and include_lower)."

See Elastic Search documentation for more information.

Geo Polygon Query

This allows "to include hits that only fall within a polygon of points".

    {
        bool => {
            query => {
                match_all => {}
            },
            filter => {
                geo_polygon => {
                    "person.location" => {
                        points => [
                            { lat => 40, lon => -70 },
                            { lat => 30, lon => -80 },
                            { lat => 20, lon => -90 }
                            # or, as an array [long, lat]
                            # [-70, 40],
                            # [-80, 30],
                            # [-90, 20],
                            # or, as a string "lat, long"
                            # "40, -70",
                            # "30, -80",
                            # "20, -90"
                            # or, as GeoHash
                            # "drn5x1g8cu2y",
                            # "30, -80",
                            # "20, -90"
                        ]
                    },
                    # Set to true to ignore invalid geo points (defaults to false)
                    ignore_malformed => \1,
                }
            }
        }
    }

See Elastic Search documentation for more information.

GeoHash Cell Query

See Elastic Search documentation for more information.

More Like This Query

As per the Elastic Search documentation, the "More Like This Query (MLT Query) finds documents that are "like" a given set of documents".

"The simplest use case consists of asking for documents that are similar to a provided piece of text".

For example, querying for all module releases that have some text similar to "Application Programming Interface" in their "abstract" and in their "description" fields, limiting the number of selected terms to 12.

    {
        more_like_this => {
            fields => [qw( abstract description )],
            like => "Application Programming Interface",
            min_term_freq => 1,
            max_query_terms => 12,
            # optional
            # unlike => "Python",
            # Defaults to 30%
            # minimum_should_match => 2,
            # boost_terms => 1,
            # Defaults to false
            # include => \1,
            # Defaults to 1.0
            # boost => 1.12
        }
    }

See Elastic Search documentation for more information.

Template Query

As per the Elastic Search documentation, this "accepts a query template and a map of key/value pairs to fill in template parameters".

    {
        query => {
            template => {
                inline => { match => { text => "{{query_string}}" }},
                params => {
                    query_string => "all about search",
                }
            }
        }
    }

would be translated to:

    {
        query => {
            match => {
                text => "all about search",
            }
        }
    }

See Elastic Search documentation for more information.

Script Query

As per the Elastic Search documentation, this is used "to define scripts as queries. They are typically used in a filter context". for example:

    bool => {
        must => {
            # query details goes here
            # ...
        },
        filter => {
            script => {
                script => "doc['num1'].value > 1"
            }
        }
    }

See Elastic Search documentation for more information.

Span Term Query

As per the Elastic Search documentation, this matches "spans containing a term".

    {
        span_term => { pauseid => "momotaro" }
    }

See Elastic Search documentation for more information.

Span Multi Terms Query

The span_multi query allows you to wrap a multi term query (one of wildcard, fuzzy, prefix, term, range or regexp query) as a span query, so it can be nested.

    {
        span_multi => {
            match => {
                prefix => { pauseid => { value => "momo" } }
            }
        }
    }

See Elastic Search documentation for more information.

Span First Query

As per the Elastic Search documentation, this matches "spans near the beginning of a field".

    {
        span_first => {
            match => {
                span_term => { pauseid => "momotaro" }
            },
            end => 3,
        }
    }

See Elastic Search documentation for more information.

Span Near Query

As per the Elastic Search documentation, this matches "spans which are near one another. One can specify slop, the maximum number of intervening unmatched positions, as well as whether matches are required to be in-order".

    {
        span_near => {
            clauses => [
                { span_term => { field => "value1" } },
                { span_term => { field => "value2" } },
                { span_term => { field => "value3" } },
            ],
            collect_payloads => \0,
            in_order => \0,
            slop => 12,
        },
    }

The clauses element is a list of one or more other span type queries and the slop controls the maximum number of intervening unmatched positions permitted.

See Elastic Search documentation for more information.

Span Or Query

As per the Elastic Search documentation, this matches "the union of its span clauses".

    {
        span_or => {
            clauses => [
                { span_term => { field => "value1" } },
                { span_term => { field => "value2" } },
                { span_term => { field => "value3" } },
            ],
        },
    }

The clauses element is a list of one or more other span type queries

See Elastic Search documentation for more information.

Span Not Query

As per the Elastic Search documentation, this removes "matches which overlap with another span query".

    {
        span_not => {
            exclude => {
                span_near => {
                    clauses => [
                        { span_term => { field1 => "la" } },
                        { span_term => { field1 => "hoya" } },
                    ],
                    in_order => \1,
                    slop => 0,
                },
            },
            include => { span_term => { field1 => "hoya" } },
        },
    }

The include and exclude clauses can be any span type query.

See Elastic Search documentation for more information.

Span Containing Query

As per the Elastic Search documentation, this returns "matches which enclose another span query".

    {
        span_containing => {
            big => {
                span_near => {
                    clauses => [
                        { span_term => { field1 => "bar" } },
                        { span_term => { field1 => "baz" } },
                    ],
                    in_order => \1,
                    slop => 5,
                },
            },
            little => { span_term => { field1 => "foo" } },
        },
    }

The big and little clauses can be any span type query. Matching spans from big that contain matches from little are returned.

See Elastic Search documentation for more information.

Span Within a Query

As per the Elastic Search documentation, this returns "matches which are enclosed inside another span query".

    {
        span_within => {
            big => {
                span_near => {
                    clauses => [
                        { span_term => { field1 => "bar" } },
                        { span_term => { field1 => "baz" } },
                    ],
                    in_order => \1,
                    slop => 5,
                },
            },
            little => { span_term => { field1 => "foo" } },
        },
    }

The big and little clauses can be any span type query. Matching spans from little that are enclosed within big are returned.

See Elastic Search documentation for more information.

AUTHOR

Jacques Deguest <jack@deguest.jp>

SEE ALSO

Net::API::CPAN::Scroll, Net::API::CPAN::List

COPYRIGHT & LICENSE

Copyright(c) 2023 DEGUEST Pte. Ltd.

All rights reserved

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.