Text::DocumentCollection - a collection of documents
The constructor; arguments must be passed as maps from keys to values. The key file is mandatory.
file
my $c = Text::DocumentCollection->new( file => 'coll.db' );
Documents from the collection are saved as in the specified file, which is currently handled by a DB_File hash.
DB_File
Add a document to the collection, tagging it with a unique key.
$c->Add( $key, $doc );
Add dies if the key is already present.
Add
die
To change an existing key, use Delete and then Add.
Delete
Discard a document from the collection.
Loads the collection from the given DB file:
my $c = Text::DocumentCollection->NewFromDB( file => 'coll.db' );
The file must be either empty or created by a former invocation of new or NewFromDB, followed by any number of Add and/or Delete.
new
NewFromDB
Currently, all documents in the collection are revived (by calling NewFromString). This poses performance problems for huge collections; a caching mechanism would be an option in this case.
NewFromString
Inverse Term frequency of a given term.
The definition we used is, given a term t, a set of documents DOC and the binary relationship has-term:
IDF(t) = log2( #DOC / #{ d in DOC | d has-term t } )
The logarithm is in base 2, since this is related to an information measurement, and # is the cardinality operator.
Enumerates all the document in the collection. Called as:
my @result = $c->EnumerateV( \&Callback, 'the rock' );
The function Callback will be called on each element of the collection as:
Callback
my @l = CallBack( $c, $key, $doc, $rock );
where $rock is the second argument to Callback.
$rock
Since $c is the first argument, the callback may be an instance method of Text::DocumentCollection.
$c
Text::DocumentCollection
The final result is obtained by concatenating all the partial results (@l in the example above). If you do not want a result, simply return the empty list ().
@l
There is no particular order of enumeration, so there is no particular order in which results are concatenated.
spinellia@acm.org walter@humans.net
To install Text::Document, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Text::Document
CPAN shell
perl -MCPAN -e shell install Text::Document
For more information on module installation, please visit the detailed CPAN module installation guide.