Search::WuManber -- A fast algorithm for multi-pattern searching
use Search::WuManber; my $search = Search::WuManber->new([qw(tribute reserved serve distribute)]); my @matches = $search->all( lc "All rights reserved. Distribute freely."); my $match = $search->first(lc "All rights reserved. Distribute freely."); $match = $search->next();
This module implements the Wu-Manber multi-pattern parallel search algorithm.
The list of search patterns passed to
new() are prepared for parallel lookup. A perl reference pointing to all internal data is returned. Treat this reference as opaque.
next() iterate over all text positions where matches occur. Pattern matches in this context are substring matches. Each match is returned as a reference to a two-element array representing text offset and list index of the matching pattern. Options for
return_string => 1 make the return value of the iterator a three-element array, containing also the search string itself. *
case_sensitive => 0 run the search in case insensitive mode (slightly slower).
The matches are returned roughly sorted by offset. Offset usually increments, but may jump backwards by the length difference of neighbouring search strings.
New() allocates a constant amount of memory (between ca. 130k and 2MB). This memory can be returned by
The iterator is inefficient.
first() just calls
This implementation uses internal hash-functions, which may not be optimal.
Efficiency of WuManber depends on the minimum length of search strings. Suggested minimum length is 5.
new() switches to a slower algorithm if one of the strings has less than 3 characters.
Changes in the list of search patterns are not seen by the search algorithm after
new() was called. Changes in the text are undefined. Call
first() to restart with a new text.
Sun Wu and Udi Manber (1994) A fast algorithm for multi-pattern searching. Technical Report TR-94-17, University of Arizona. http://webglimpse.net/pubs/TR94-17.pdf
Rolf Stiebe, Textalgorithmen, WS 2005/6
Juergen Weigert <firstname.lastname@example.org>
Copyright (c) 2007-2011, Juergen Weigert, openSUSE.org This module is free software. It may be used, redistributed and/or modified under the same terms as Perl itself.