Search::VectorSpace - a very basic vector-space search engine
use Search::VectorSpace; my @docs = ...; my $engine = Search::VectorSpace->new( docs => \@docs, threshold => .04); $engine->build_index(); while ( my $query = <> ) { my %results = $engine->search( $query ); print join "\n", keys %results; }
This module takes a list of documents (in English) and builds a simple in-memory search engine using a vector space model. Documents are stored as PDL objects, and after the initial indexing phase, the search should be very fast. This implementation applies a rudimentary stop list to filter out very common words, and uses a cosine measure to calculate document similarity. All documents above a user-configurable similarity threshold are returned.
Object constructor. Argument hash must contain a key 'docs' whose value is a reference to an array of documents. The hash can also contain an optional threshold setting, between zero and one, to serve as a relevance cutoff for search results.
Creates the document vectors and stores them in memory, along with a master word list for the document collection.
Returns all documents matching the QUERY string above the set relevance threshold. Unlike regular search engines, the query can be arbitrarily long, and contain pretty much anything. It gets mapped into a query vector just like the documents in the collection were. Returns a hash in the form RESULT => RELEVANCE, where the relevance value is between zero and one.
Rudimentary parser, splits string on whitespace and removes punctuation. Returns a hash in the form WORD => NUMBER, where NUMBER is how many times the word was found.
Convenience wrapper for Lingua::Stem::stem()
Maciej Ceglowski <maciej@ceglowski.com>
This program is free software, released under the GNU public license
To install Search::VectorSpace, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Search::VectorSpace
CPAN shell
perl -MCPAN -e shell install Search::VectorSpace
For more information on module installation, please visit the detailed CPAN module installation guide.