The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Search::Tools::QueryParser - convert string queries into objects

SYNOPSIS

 use Search::Tools::QueryParser;
 my $qparser = Search::Tools::QueryParser->new(
        
        # regex to define a query term (word)
            term_re        => qr/\w+(?:'\w+)*/,
        
        # or assemble a definition from the following
            word_characters     => q/\w\'\-/,
            ignore_first_char   => q/\+\-/,
            ignore_last_char    => q/\+\-/,
            term_min_length     => 1,
            
        # words to ignore
            stopwords           => [qw( the )],
            
        # query operators
            and_word            => q(and),
            or_word             => q(or),
            not_word            => q(not),
            phrase_delim        => q("),
            treat_uris_like_phrases => 1,
            ignore_fields       => [qw( site )],
            wildcard            => quotemeta(q(*)),
                        
        # language-specific settings
            stemmer             => &your_stemmer_here,       
            charset             => 'iso-8859-1',
            lang                => 'en_US',
            locale              => 'en_US.iso-8859-1',

        # development help
            debug               => 0,
    );
    
 my $query    = $qparser->parse(q(the quick color:brown "fox jumped"));
 my $terms    = $query->terms; # ['quick', 'brown', '"fox jumped"']
 
 # a Search::Tools::RegEx object
 my $regexp   = $query->regexp_for($terms->[0]); 
 
 # the Search::Query::Dialect tree()
 my $tree     = $query->tree;
 
 print "$query\n";  # the quick color:brown "fox jumped"
 print $query->str . "\n"; # same thing
 
 

DESCRIPTION

Search::Tools::QueryParser turns search queries into objects that can be applied for highlighting, spelling, and extracting matching snippets from source documents.

METHODS

new( %opts )

The new() method instantiates a QueryParser object. With the exception of parse(), all the following methods can be passed as key/value pairs in new().

BUILD

Called internally by new().

parse( query )

The parse() method parses query and returns a Search::Tools::Query object.

query must be a scalar string.

NOTE: All queries are converted to UTF-8. See the charset param.

stemmer

The stemmer function is used to find the root 'stem' of a word. There are many stemming algorithms available, including many on CPAN. The stemmer function should expect to receive two parameters: the QueryParser object and the word to be stemmed. It should return exactly one value: the stemmed word.

Example stemmer function:

 use Lingua::Stem;
 my $stemmer = Lingua::Stem->new;
 
 sub mystemfunc {
     my ($parser, $word) = @_;
     return $stemmer->stem($word)->[0];
 }
 
 # and pass to the new() method:
 
 my $qparser = Search::Tools::QueryParser->new(stemmer => \&mystemfunc);
     

stopwords

A list of common words that should be ignored in parsing out keyword terms. May be either a string that will be split on whitespace, or an array ref.

NOTE: If a stopword is contained in a phrase, then the phrase will be tokenized into words based on whitespace, then the stopwords removed.

end_bound

get_defaults

html_phrase_bound

phrase_delim

plain_phrase_bound

start_bound

tag_re

term_re

term_min_length

whitespace

word_characters

ignore_first_char

String of characters to strip from the beginning of all words.

ignore_last_char

String of characters to strip from the end of all words.

ignore_case

All queries are run through Perl's built-in lc() function before parsing. The default is 1 (true). Set to 0 (false) to preserve case.

ignore_fields

Value may be a hash or array ref of field names to ignore in query parsing. Example:

 ignore_fields => [qw( site )]

would parse the query:

 site:foo.bar AND baz   # terms = baz

default_field

Set the default field to be used in parsing the query, if no field is specified. The default is the empty string (the Search::Query::Parser default).

treat_uris_like_phrases

Boolean (default true (1)).

If set to true, queries like foo@bar.com will be treated like a single phrase "foo bar com" instead of being split into three separate terms.

and_word

Default: and|near\d*

or_word

Default: or

not_word

Default: not

wildcard

Default: *

locale

Set a locale explicitly. If not set, the locale is inherited from the LC_CTYPE environment variable.

LC_CTYPE

Imported function by locale pragma. Documented only to satisfy pod tests.

lang

Base language. If not set, extracted from locale or defaults to en_US.

charset

Base charset used for converting queries to UTF-8. If not set, extracted from locale or defaults to iso-8859-1.

query_class

The default is Search::Tools::Query but you can set your own to subclass the Query object.

query_dialect

The default is Search::Query::Dialect::Native but you can set your own. See the Search::Query::Dialect documentation.

LIMITATIONS

The special HTML chars &, < and > can pose problems in regexps against markup, so they are ignored in creating regular expressions if you include them in word_characters in new().

AUTHOR

Peter Karman <karman@cpan.org>

BUGS

Please report any bugs or feature requests to bug-search-tools at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Search-Tools. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Search::Tools

You can also look for information at:

COPYRIGHT

Copyright 2009 by Peter Karman.

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO

Search::Query::Parser