The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

Changes for version 0.24

  • thanks to Henry at zen for prompting the bug fixes and improvements in this release.
  • fix Data::Dump calls from pp() to fully-qualified.
  • Snipper->snip() will always return UTF-8 encoded text.
  • rename Snipper methods snipper_name, snipper_force and snipper_type to type_used, force and type.
  • document Snipper->type().
  • fix some off-by-one errors in all the snip() algorithms
  • fix the debugging code in Snipper
  • add sanity check fallback to plain() hiliter to persevere if plain regex obviously fails.
  • add ignore_fields feature
  • add treat_uris_like_phrases feature
  • RegExp, RegExp::Keywords, RegExp::Keyword and Keywords are all deprecated in favor of the new, tidier and cleaner QueryParser, Query and RegEx classes. Backwards compatibility is preserved for existing code, but users should move to the new API as documented in Search::Tools. RegExp will carp every time you build() with it.
  • added new Tokenizer, Token and TokenList XS code for must faster snipping.
  • added PP versions of tokenizing code, both for benchmarking and comparision. As expected, XS is much faster. The extra speed makes it possible to be more accurate in snippet extraction without sacrificing performance.

Modules

high-performance tools for building search applications
locate the best matches in a snippet extract
highlight terms in text
extract keywords from a search query
Class::Accessor::Fast-compatible accessors
base class for Search::Tools objects
objectified string for highlighting, snipping, etc.
convert string queries into objects
regular expressions for terms
build regular expressions from search queries
access regular expressions for a keyword
access regular expressions for keywords
extract terms in context
offer spelling suggestions
a token object returned from a TokenList
a bunch of tokens from a Tokenizer
a bunch of tokens from a Tokenizer
mixin methods for TokenList and TokenListPP
a token object returned from a TokenList
split a string into meaningful tokens
transliterations of UTF-8 chars
UTF8 string wrangling
methods for playing nice with XML and HTML