lib/Text/Prefix/XS.pm - metacpan.org


            
              1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
—
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
              package Text::Prefix::XS;
use XSLoader;
use strict;
use warnings;
our $VERSION = '0.01-TRIAL';
XSLoader::load __PACKAGE__, $VERSION;
use base qw(Exporter);
our @EXPORT = qw(
    prefix_search_build
    prefix_search_create
    prefix_search);
1;
sub prefix_search_create(@)
{
    my @copy = @_;
    @copy = sort { length $b <=> length $a || $a cmp $b } @copy;
    return prefix_search_build(\@copy);
}
__END__
=head1 NAME
Text::Prefix::XS - Fast prefix searching
=head1 SYNOPSIS
    use Text::Prefix::XS;
    my @haystacks = qw(
        garbage
        blarrgh
        FOO
        meh
        AA-ggrr
        AB-hi!
    );
     
    my @needles = qw(AAA AB FOO FOO-BAR);
     
    my $search = prefix_search_build( map uc($_), @needles );
     
    my %seen_hash;
     
    foreach my $haystack (@haystacks) {
        if(my $prefix = prefix_search($search, $haystack)) {
            $seen_hash{$prefix}++;
        }
    }
     
    $seen_hash{'FOO'} == 1;
     
    #Compare to:
    my $re = join('|', map quotemeta $_, @needles);
    $re = qr/^($re)/;
     
    foreach my $haystack (@haystacks) {
        my ($match) = ($haystack =~ $re);
        if($match) {
            $seen_hash{$match}++;
        }
    }
    $seen_hash{'FOO'} == 1;
=head1 DESCRIPTION
This module implements something of an I<trie> algorithm for matching
(and extracting) prefixes from text strings.
A common application I had was to pre-filter lots and lots of text for a small
amount of preset prefixes.
Interestingly enough, the quickest solution until I wrote this module was to use
a large regular expression (as in the synopsis)
=head1 FUNCTIONS
The interface is relatively simple. This is alpha software and the API is subject
to change
=head2 prefix_search_create(@prefixes)
Create an opaque prefix search handle. It returns a thingy, which you should
keep around.
Internally it will order the elements in the list, with the longest prefix
being first.
It will then construct a search trie using a variety of caching and lookup layers.
=head2 prefix_search($thingy, $haystack)
Will check C<$haystack> for any of the prefixes in C<@needles> passed to
L</prefix_search_create>. If C<$haystack> has a prefix, it will be returned by
this function; otherwise, the return value is C<undef>
=head1 PERFORMANCE
This module performs better than regex under any circumstance. In the future, a
benchmark table will be posted - but on average, it's about 30-50% quicker than
a regex.
This module would be even quicker if there were some way to implement this as an
actual C<OP> rather than an C<xsub> call. But the performance is quite nice anyway
=head1 SEE ALSO
There are quite a few modules out there which aim for a Trie-like search, but
they are all either not written in C, or would not be performant enough for this
application.
L<Text::Trie>
L<Regexp::Trie>
L<Regexp::Optimizer>
=head1 NOTES / TODO
While my implementation is probably sloppy, the simplicity of the search itself
makes it very quick and cruftless. When doing a prefix search on a large amount
of text, but with a small number of prefixes, the reduction of overhead is the
most important optimization for gaining performance.
=head1 CAVEATS
Private perl data structures are allocated internally, therefore it wouldn't do
good to use this module across threads. Also, memory leaks will ensue if you
destroy the search object. But, search handles are expensive to create, and are
assumed to be made for relatively static 'needles'.
This is only because the developer is lazy and tired. Most of this should be
fixed in a stable release
=head1 AUTHOR AND COPYRIGHT
Copyright (C) 2011 M. Nunberg
You may use and distribute this software under the same terms, conditions, and
licensing as Perl itself.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)