The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Search::Tools::RegExp - build regular expressions from search queries

SYNOPSIS

 my $regexp = Search::Tools::RegExp->new();
 
 my $kw = $regexp->build('the quick brown fox');
 
 for my $w ($kw->keywords)
 {
    my $r = $kw->re( $w );
    
    # the word itself
    printf("the word is %s\n", $r->word);
    
    # is it flagged as a phrase?
    print "the word is a phrase\n" if $r->phrase;
    
    # each of these are regular expressions
    print $r->plain;
    print $r->html;
 }
 

DESCRIPTION

Build regular expressions for a string of text.

All text is converted to UTF-8 automatically if it isn't already, via the Search:Tools::Keywords module.

VARIABLES

The following package variables are defined:

UTF8Char

Regexp defining a valid UTF-8 word character. Default \w.

WordChar

Default word_characters regexp. Defaults to UTF8Char plus ', . and -.

IgnFirst

Default ignore_first_char regexp. Defaults to ' and -.

IgnLast

Default ignore_last_char regexp. Defaults to ', . and -.

PhraseDelim

Phrase delimiter character. Default is double-quote '"'.

Wildcard

Character to use as a wildcard. Default is asterik '*'.

METHODS

new

Create new object. The following parameters are also accessors:

kw

A Search::Tools::Keywords object, if you want to pass in one instead of having one made for you.

wildcard

The wildcard character. Default is $Wildcard.

word_characters

Regexp for what characters constitute a 'word'. Default is $WordChar.

ignore_first_char

Default is $IgnFirst.

ignore_last_char

Default is $IgnLast.

stemmer

Stemming code ref passed through to the default Search::Tools::Keywords object.

phrase_delim

Phrase delimiter. Defaults to $PhraseDelim.

stopwords

Words to be ignored.

debug

Turn on helpful info on stderr.

isHTML( str )

Returns true if str contains anything that looks like HTML markup:

 < > or &[#\w]+;

This is a naive check but useful for internal purposes.

build( str )

Returns a Search::Tools::RegExp::Keywords object.

BUGS and LIMITATIONS

The special HTML chars &, < and > can pose problems in regexps against markup, so they are ignored if you include them in word_characters in new().

AUTHOR

Peter Karman perl@peknet.com

Based on the HTML::HiLiter regular expression building code, originally by the same author, copyright 2004 by Cray Inc.

Thanks to Atomic Learning www.atomiclearning.com for sponsoring the development of this module.

COPYRIGHT

Copyright 2006 by Peter Karman. This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO

HTML::HiLiter, Search::Tools, Search::Tools::RegExp::Keywords, Search::Tools::RegExp::Keyword