The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Search::Query::Parser - convert query strings into query objects

SYNOPSIS

 use Search::Query;
 my $parser = Search::Query->parser(
    term_regex  => qr/[^\s()]+/,
    field_regex => qr/\w+/,
    op_regex    => qr/==|<=|>=|!=|=~|!~|[:=<>~#]/,

    # ops that admit an empty left operand
    op_nofield_regex => qr/=~|!~|[~:#]/,

    # case insensitive
    and_regex        => qr/\&|AND|ET|UND|E/i,
    or_regex         => qr/\||OR|OU|ODER|O/i,
    not_regex        => qr/NOT|PAS|NICHT|NON/i,

    default_field  => 'myfield',  # or ['myfield', 'myfield2']
    phrase_delim   => q/"/,
    default_boolop => '+',
    query_class    => 'Search::Query::Dialect::Native',
    field_class    => 'Search::Query::Field',
    query_class_opts => {
        default_field => 'foo', # or ['foo', 'bar']
    },
    
    # a generous mode, overlooking boolean-parser syntax errors
    sloppy              => 0,
    sloppy_term_regex   => qr/[\.\w]+/,
    fixup               => 0,
    
    # if set, this special term indicates a NULL query
    null_term           => 'NULL',
 );

 my $query = $parser->parse('+hello -world now');
 print $query;

DESCRIPTION

Search::Query::Parser is a fork of Search::QueryParser that supports multiple query dialects.

The Parser class transforms a query string into a Dialect object structure to be handled by external search engines.

The query string can contain simple terms, "exact phrases", field names and comparison operators, '+/-' prefixes, parentheses, and boolean connectors.

The parser can be customized using regular expressions for specific notions of "term", "field name" or "operator" -- see the new method.

The Dialect object resulting from a parsed query is a tree of terms and operators. Each Dialect can be re-serialized as a string using the stringify() method, or simply by printing the Dialect object, since the string-related Perl operations are overloaded using stringify().

QUERY STRING

The query string is decomposed into Clause objects, where each Clause has an optional sign prefix, an optional field name and comparison operator, and a mandatory value.

Sign prefix

Prefix '+' means that the item is mandatory. Prefix '-' means that the item must be excluded. No prefix means that the item will be searched for, but is not mandatory.

See also section "Boolean connectors" below, which is another way to combine items into a query.

Field name and comparison operator

Internally, each query item has a field name and comparison operator; if not written explicitly in the query, these take default values '' (empty field name) and ':' (colon operator).

Operators have a left operand (the field name) and a right operand (the value to be compared with); for example, foo:bar means "search documents containing term 'bar' in field 'foo'", whereas foo=bar means "search documents where field 'foo' has exact value 'bar'".

Here is the list of admitted operators with their intended meaning:

:

treat value as a term to be searched within field. This is the default operator.

~ or =~

treat value as a regex; match field against the regex.

Note that ~ after a phrase indicates a proximity assertion:

 "foo bar"~5

means "match 'foo' and 'bar' within 5 positions of each other."

!~

negation of above

== or =, <=, >=, !=, <, >

classical relational operators

#

Inclusion in the set of comma-separated integers supplied on the right-hand side.

Operators :, ~, =~, !~ and # admit an empty left operand (so the field name will be ''). Search engines will usually interpret this as "any field" or "the whole data record". But see the default_field feature.

Value

A value (right operand to a comparison operator) can be

  • A term (as recognized by regex term_regex, see new method below).

  • A quoted phrase, i.e. a collection of terms within single or double quotes.

    Quotes can be used not only for "exact phrases", but also to prevent misinterpretation of some values : for example -2 would mean "value '2' with prefix '-'", in other words "exclude term '2'", so if you want to search for value -2, you should write "-2" instead.

    Note that ~ after a phrase indicates a proximity assertion:

     "foo bar"~5

    means "match 'foo' and 'bar' within 5 positions of each other."

  • A subquery within parentheses. Field names and operators distribute over parentheses, so for example foo:(bar bie) is equivalent to foo:bar foo:bie.

    Nested field names such as foo:(bar:bie) are not allowed.

    Sign prefixes do not distribute : +(foo bar) +bie is not equivalent to +foo +bar +bie.

Boolean connectors

Queries can contain boolean connectors 'AND', 'OR', 'NOT' (or their equivalent in some other languages -- see the *_regex features in new()). This is mere syntactic sugar for the '+' and '-' prefixes : a AND b is equivalent to +a +b; a OR b is equivalent to (a b); NOT a is equivalent to -a. +a OR b does not make sense, but it is translated into (a b), under the assumption that the user understands "OR" better than a '+' prefix. -a OR b does not make sense either, but has no meaningful approximation, so it is rejected.

Combinations of AND/OR clauses must be surrounded by parentheses, i.e. (a AND b) OR c or a AND (b OR c) are allowed, but a AND b OR c is not.

The NEAR connector is treated like the proximity phrase assertion.

 foo NEAR5 bar

is treated as if it were:

 "foo bar"~5

See the near_regex option.

METHODS

new

The following attributes may be initialized in new(). These are also available as get/set methods on the returned Parser object.

default_boolop
term_regex
field_regex
op_regex
op_nofield_regex
and_regex
or_regex
not_regex
near_regex
range_regex
default_field

Applied to all terms where no field is defined. The default value is undef (no default).

default_op

The operator used when default_field is applied.

fields
phrase_delim
query_class

dialect is an alias for query_class.

field_class
clause_class
query_class_opts

Will be passed to query_class new() method each time a query is parse()'d.

dialect_opts

Alias for query_class_opts.

croak_on_error

Default value is false (0). Set to true to automatically throw an exception via Carp::croak() if parse() would return undef.

term_expander

A function reference for transforming query terms after they have been parsed. Examples might include adding alternate spellings, synonyms, or expanding wildcards based on lexicon listings.

Example:

 my $parser = Search::Query->parser(
    term_expander => sub {
        my ($term, $field) = @_;
        return ($term) if ref $term;    # skip ranges
        return ( qw( one two three ), $term );
    }
 );

 my $query = $parser->parse("foo=bar")
 print "$query\n";  # +foo=(one OR two OR three OR bar)

The term_expander reference should expect two arguments: the term value and, if available, the term field name. It should return an array of values.

The term_expander reference is called internally during the parse() method, before any field alias expansion or validation is performed.

sloppy( 0|1 )

If the string passed to parse() has any incorrect or unsupported syntax in it, the default behavior is for parsing to stop immediately, error() to be set, and for parse() to return undef.

In certain cases (as on a web form) this is undesirable. Set sloppy mode to true to fallback to non-boolean evaluation of the string, which in most cases should still return a Dialect object.

Example:

 $parser->parse('foo -- OR bar');  # if sloppy==0, returns undef
 $parser->parse('foo -- OR bar');  # if sloppy==1, equivalent to 'foo bar'
sloppy_term_regex

The regex definition used to match a term when sloppy==1.

fixup( 0|1 )

Attempt to fix syntax errors like the lack of a closing parenthesis or a missing double-quote. Different than sloppy() which will not attempt to fix broken syntax, but should probably be used together if you really do not care about strict syntax checking.

null_term

If set to term, the null_term feature will treat field value of term as if it was undefined. Example:

 $parser->parse('foo=');     # throws fatal error
 $parser->null_term('NULL');
 $parser->parse('foo=NULL'); # field foo has NULL value

This feature is most useful with the SQL dialect, where you might want to find NULL values. Use it like:

 my $parser = Search::Query->parser(
     dialect    => 'SQL',
     null_term  => 'NULL'
 );
 my $query = $parser->parse('foo!=NULL');
 print $query;  # prints "foo is not NULL"

BUILDARGS

Internal method for mangling constructor params.

BUILD

Called internally to initialize the object.

error

Returns the last error message.

clear_error

Sets error message to undef.

get_field( name )

Returns Field object for name or undef if there isn't one defined.

set_fields( fields )

Set the fields structure. Called internally by BUILD() if you pass a fields key/value pair to new().

The structure of fields may be one of the following:

 my $fields = {
    field1 => 1,
    field2 => { alias_for => 'field1' },
    field3 => Search::Query::Field->new( name => 'field3' ),
    field4 => { alias_for => [qw( field1 field3 )] },
 };

 # or

 my $fields = [
    'field1',
    { name => 'field2', alias_for => 'field1' },
    Search::Query::Field->new( name => 'field3' ),
    { name => 'field4', alias_for => [qw( field1 field3 )] },
 ];

set_field( name => field_object )

Sets field name to Field object field_object.

parse( string )

Returns a Search::Query::Dialect object of type query_class.

If there is a syntax error in string, parse() will return undef and set error().

AUTHOR

Peter Karman, <karman at cpan.org>

BUGS

Please report any bugs or feature requests to bug-search-query at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Search-Query. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Search::Query

You can also look for information at:

ACKNOWLEDGEMENTS

This module started as a fork of Search::QueryParser by Laurent Dami.

COPYRIGHT & LICENSE

Copyright 2010 Peter Karman.

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.