Lingua::StopWords - Stop words for several languages. River stage two • 14 direct dependents • 27 total dependents

In keyword search, it is common practice to suppress a collection of "stopwords": words such as "the", "and", "maybe", etc. which exist in in a large number of documents and do not tell you anything important about any document which contains them. T...

CREAMYG/Lingua-StopWords-0.09 - 22 Aug 2008 15:34:58 GMT - Search in distribution

Lingua::EN::StopWords - Typical stop words for an English corpus River stage one • 1 direct dependent • 1 total dependent

See synopsis....

SPLICE/Lingua-EN-Segmenter-0.1 - 03 Mar 2005 03:20:54 GMT - Search in distribution

Lingua::PTD - Module to handle PTD files in Dumper Format River stage one • 2 direct dependents • 2 total dependents

PTD files in Perl Dumper format are simple hashes references. But they use a specific structure, and this module provides a simple interface to manipulate it. "new" The "new" constructor returns a new Lingua::PTD object. This constructor receives a P...

AMBS/Lingua-PTD-1.16 - 20 Aug 2017 17:50:33 GMT - Search in distribution

Pod::Spell - a formatter for spellchecking Pod River stage three • 9 direct dependents • 217 total dependents

Pod::Spell is a Pod formatter whose output is good for spellchecking. Pod::Spell rather like Pod::Text, except that it doesn't put much effort into actual formatting, and it suppresses things that look like Perl symbols or Perl jargon (so that your s...

DOLMEN/Pod-Spell-1.20 - 22 Apr 2016 07:36:12 GMT - Search in distribution

HTML::Index::Store - subclass'able module for storing inverted index files for the HTML::Index modules. River stage zero No dependents

The HTML::Index::Store module is generic interface to provide storage for the inverted indexes used by the HTML::Index modules. The reference implementation uses in memory storage, so is not suitable for persistent applications (where the search / in...

AWRIGLEY/HTML-Index-0.15 - 30 Jun 2003 15:35:32 GMT - Search in distribution

Redis::Bayes - Bayesian classification on Redis River stage zero No dependents

This module is an implementation of naive Bayes on Redis....

TRSKI/Redis-Bayes-0.024 - 17 Jan 2015 19:11:02 GMT - Search in distribution

Text::TermExtract - Extract terms from text River stage zero No dependents

Text::TermExtract takes a simple approach at extracting the most interesting terms from documents of arbitrary length. There's more scientific methods to term extraction, like Yahoo's online term extraction API (but you can't have it locally) and the...

MSCHILLI/Text-TermExtract-0.02 - 10 Mar 2008 05:14:28 GMT - Search in distribution

Text::Compare - Language sensitive text comparison River stage zero No dependents

Text::Compare is an attempt to write a high speed text compare tool based on Vector comparision which uses language dependend stopwords. Text::Compare uses Lingua::Identify to find the language of the given texts, then uses Lingua::StopWords to get t...

STRO/Text-Compare-1.03 - 23 Jun 2007 05:44:31 GMT - Search in distribution

Lingua::EN::Bigram - Extract n-grams from a text and list them according to frequency and/or T-Score River stage zero No dependents

This module is designed to: 1) pull out all of the ngrams (multi-word phrases) in a given text, and 2) list these phrases according to their frequency. Using this module is it possible to create lists of the most common phrases in a text as well as o...

EMORGAN/Lingua-EN-Bigram-0.03 - 24 Aug 2010 02:01:46 GMT - Search in distribution

CPAN::Nearest - find the nearest module to a given name. River stage zero No dependents

This module provides a way of searching for CPAN modules whose name may be misspelt. For example, if a user accidentally types "Lingua::Stopwords" when looking for the module "Lingua::StopWords", the common cpan clients will not be able to function: ...

BKB/CPAN-Nearest-0.13 - 13 Jan 2017 01:26:37 GMT - Search in distribution

Text::WordCounter - counting words in multilingual texts River stage one • 1 direct dependent • 1 total dependent

It is quite heuristic, for example '-' and digits inside word characters are treated as a word character, see the tests to find out how all the special cases are resolved, The features parameter should be a hashref and is an accumulator for found fea...

ZBY/Text-WordCounter-0.001 - 18 Jan 2013 11:26:47 GMT - Search in distribution

Search::Tokenizer - Decompose a string into tokens (words) River stage one • 1 direct dependent • 1 total dependent

This module builds an iterator function that will progressively extract terms from a given input string. Terms are defined by a regular expression (for example "\w+"). Term matching relies on the builtin "global match" operator of Perl (the 'g' flag)...

DAMI/Search-Tokenizer-1.01 - 15 Feb 2013 18:57:49 GMT - Search in distribution

Lingua::EN::Ngram - Extract n-grams from texts and list them according to frequency and/or T-Score River stage zero No dependents

This module is designed to extract n-grams from texts and list them according to frequency and/or T-Score. To elaborate, the purpose of Lingua::EN::Ngram is to: 1) pull out all of the ngrams (multi-word phrases) in a given text, and 2) list these phr...

EMORGAN/Lingua-EN-Ngram-0.03 - 29 Mar 2018 03:28:09 GMT - Search in distribution

Search::Tools::QueryParser - convert string queries into objects River stage two • 12 direct dependents • 27 total dependents

Search::Tools::QueryParser turns search queries into objects that can be applied for highlighting, spelling, and extracting matching snippets from source documents....

KARMAN/Search-Tools-1.007 - 01 May 2018 16:14:48 GMT - Search in distribution

Text::TFIDF::Ngram - Compute the TF-IDF measure for ngram phrases River stage zero No dependents

This module computes the TF-IDF ("term frequency-inverse document frequency") measure for a corpus of text documents. For a working example program, please see the eg/analyze file in the distribution....

GENE/Text-TFIDF-Ngram-0.0207 - 09 Apr 2018 04:05:45 GMT - Search in distribution

NNexus::StopWordList - A stop word list for mathematical texts River stage zero No dependents

This class provides an example stopword list for the specific domain of mathematical texts. It builds on the excellent list from Lingua::EN::StopWordList with a number of modifications particular to mathematical discourse. The modifications have been...

DGINEV/NNexus-2.0.3 - 13 Apr 2015 23:17:27 GMT - Search in distribution

Text::Language::Guess - Trained module to guess a document's language River stage one • 2 direct dependents • 2 total dependents

Text::Language::Guess guesses a document's language. Its implementation is simple: Using "Text::ExtractWords" and "Lingua::StopWords" from CPAN, it determines how many of the known stopwords the document contains for each language supported by "Lingu...

MSCHILLI/Text-Language-Guess-0.02 - 20 Nov 2005 04:08:56 GMT - Search in distribution

AI::Categorizer::Document - Embodies a document River stage one • 2 direct dependents • 2 total dependents

The Document class embodies the data in a single document, and contains methods for turning this data into a FeatureVector. Usually documents are plain text, but subclasses of the Document class may handle any kind of data....

KWILLIAMS/AI-Categorizer-0.09 - 24 Mar 2007 02:39:15 GMT - Search in distribution

Lingua::EN::Keywords - Automatically extracts keywords from text River stage zero No dependents

This is a very simple algorithm which removes stopwords from a summarized version of a text (generated with Lingua::EN::Summarize) and then counts up what it considers to be the most important "keywords". The "keywords" subroutine returns a list of f...

SIMON/Lingua-EN-Keywords-2.0 - 28 Apr 2003 10:23:29 GMT - Search in distribution

Lingua::ZH::Keywords - Extract keywords from Chinese text River stage zero No dependents

This is a very simple algorithm which removes stopwords from the text, and then counts up what it considers to be the most important keywords. The "keywords" subroutine returns a list of keywords in order of relevance. The stopwords list is accessibl...

AUTRIJUS/Lingua-ZH-Keywords-0.04 - 20 Jan 2003 22:42:35 GMT - Search in distribution

24 results (0.032 seconds)