Text::DAWG - directed acyclic word graphs
use Text::DAWG; my $dawg=Text::DAWG::->new([qw(one two three)]); print "one\n" if $dawg->match("one"); # prints something print "four\n" if $dawg->match("four"); # prints nothing
Text::DAWG implements implements string set recognition by way of directed acyclic word graphs.
Creates a new DAWG matching the strings in an array.
Creates a new DAWG from a compact representation stored in a file, or dies if anything goes wrong. The filehandle must be opened for reading and binmoded before the call.
Returns a true value if the DAWG contains the string.
Stores a compact representation of the DAWG in a file. The filehandle must be opened for writing and binmoded before the call.
Outputs a dot language representation of the DAWG (see http://www.graphviz.org/). The filehandle must be opened for writing before the call. If the DAWG contains any non-ASCII characters, you must set an appropriate encoding as well.
You can pass a reference to a hash of options for tweaking the output. The following keys are recognised:
The value must be a hash reference specifying global attributes for the generated digraph.
The value must be a hash reference specifying default attributes for subgraphs.
The value must be a hash reference specifying default attributes for edges.
The value must be a hash reference specifying default attributes for nodes. Defaults to { shape => 'circle' }.
{ shape => 'circle' }
The value must be a hash reference specifying attributes for the start node.
The value must be a hash reference specifying attributes for a matching node. Defaults to { shape => 'doublecircle' }.
{ shape => 'doublecircle' }
The value must be a hash reference specifying attributes for a matching start node. Defaults to the combination of the start and match options, with match given priority.
start
match
The value must be a hash reference with single characters for keys and hash references for values. It specifies attributes for edges representing the given characters. The default has an entry for the space character containng { label => 'SP' }, since an edge label consisting of a single space is hard to notice.
{ label => 'SP' }
An id for the digraph itself.
If true, certain optimisations that reduce both the size and the readability of the output are not performed.
Node ids are positive integers, with the start node always 1.
Edges have a default label equal to the character it represents. You can override this with the chars option.
chars
You can pass extra arguments to the constructor to output a dot language representation of the trie that is the un-optimised version of the DAWG. Groups of trie nodes that correspond to the same DAWG node will be clustered.
A Text::DAWG is always slower than a built-in Perl hash.
A Text::DAWG containing a set of strings with many common prefixes and suffixes (e.g. a dictionary of English words) may use less memory than a built-in Perl hash. However, the unoptimised trie and the optimisation process itself uses many times as much memory as the final result. Loading a stored DAWG from a file uses very little extra memory.
Bo Lindbergh <blgl@stacken.kth.se>
Copyright 2011, Bo Lindbergh
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.9 or, at your option, any later version of Perl 5 you may have available.
To install Text::DAWG, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Text::DAWG
CPAN shell
perl -MCPAN -e shell install Text::DAWG
For more information on module installation, please visit the detailed CPAN module installation guide.