Lingua::EN::StopWords - Typical stop words for an English corpus River stage one • 1 direct dependent • 1 total dependent

See synopsis....

SPLICE/Lingua-EN-Segmenter-0.1 - 03 Mar 2005 03:20:54 UTC - Search in distribution

lib/Lingua/StopWords/EN.pm River stage two • 15 direct dependents • 32 total dependents

WOLLMERS/Lingua-StopWords-0.12 - 18 Apr 2021 08:32:07 UTC - Search in distribution

Pod::Spell - a formatter for spellchecking Pod River stage two • 11 direct dependents • 77 total dependents

Pod::Spell is a Pod formatter whose output is good for spellchecking. Pod::Spell rather like Pod::Text, except that it doesn't put much effort into actual formatting, and it suppresses things that look like Perl symbols or Perl jargon (so that your s...

DOLMEN/Pod-Spell-1.20 - 22 Apr 2016 07:36:12 UTC - Search in distribution

Text::Compare - Language sensitive text comparison River stage zero No dependents

Text::Compare is an attempt to write a high speed text compare tool based on Vector comparision which uses language dependend stopwords. Text::Compare uses Lingua::Identify to find the language of the given texts, then uses Lingua::StopWords to get t...

STRO/Text-Compare-1.03 - 23 Jun 2007 05:44:31 UTC - Search in distribution

Lingua::EN::Bigram - Extract n-grams from a text and list them according to frequency and/or T-Score River stage zero No dependents

This module is designed to: 1) pull out all of the ngrams (multi-word phrases) in a given text, and 2) list these phrases according to their frequency. Using this module is it possible to create lists of the most common phrases in a text as well as o...

EMORGAN/Lingua-EN-Bigram-0.03 - 24 Aug 2010 02:01:46 UTC - Search in distribution

Search::Tokenizer - Decompose a string into tokens (words) River stage one • 3 direct dependents • 6 total dependents

This module builds an iterator function that will progressively extract terms from a given input string. Terms are defined by a regular expression (for example "\w+"). Term matching relies on the builtin "global match" operator of Perl (the 'g' flag)...

DAMI/Search-Tokenizer-1.02 - 11 May 2021 22:08:23 UTC - Search in distribution

Text::TFIDF::Ngram - Compute the TF-IDF measure for ngram phrases River stage zero No dependents

This module computes the TF-IDF ("term frequency - inverse document frequency") measure for a corpus of text documents. This module will only work when given more than one document. Because the idf method is computed based on all documents, a single ...

GENE/Text-TFIDF-Ngram-0.0508 - 15 Feb 2021 19:39:36 UTC - Search in distribution

Lingua::EN::Ngram - Extract n-grams from texts and list them according to frequency and/or T-Score River stage one • 2 direct dependents • 2 total dependents

This module is designed to extract n-grams from texts and list them according to frequency and/or T-Score. To elaborate, the purpose of Lingua::EN::Ngram is to: 1) pull out all of the ngrams (multi-word phrases) in a given text, and 2) list these phr...

EMORGAN/Lingua-EN-Ngram-0.03 - 29 Mar 2018 03:28:09 UTC - Search in distribution

NNexus::StopWordList - A stop word list for mathematical texts River stage zero No dependents

This class provides an example stopword list for the specific domain of mathematical texts. It builds on the excellent list from Lingua::EN::StopWordList with a number of modifications particular to mathematical discourse. The modifications have been...

DGINEV/NNexus-2.0.3 - 13 Apr 2015 23:17:27 UTC - Search in distribution

Text::Language::Guess - Trained module to guess a document's language River stage one • 2 direct dependents • 3 total dependents

Text::Language::Guess guesses a document's language. Its implementation is simple: Using "Text::ExtractWords" and "Lingua::StopWords" from CPAN, it determines how many of the known stopwords the document contains for each language supported by "Lingu...

MSCHILLI/Text-Language-Guess-0.02 - 20 Nov 2005 04:08:56 UTC - Search in distribution

Lingua::ZH::Keywords - Extract keywords from Chinese text River stage zero No dependents

This is a very simple algorithm which removes stopwords from the text, and then counts up what it considers to be the most important keywords. The "keywords" subroutine returns a list of keywords in order of relevance. The stopwords list is accessibl...

AUTRIJUS/Lingua-ZH-Keywords-0.04 - 20 Jan 2003 22:42:35 UTC - Search in distribution

Lingua::EN::Keywords - Automatically extracts keywords from text River stage zero No dependents

This is a very simple algorithm which removes stopwords from a summarized version of a text (generated with Lingua::EN::Summarize) and then counts up what it considers to be the most important "keywords". The "keywords" subroutine returns a list of f...

SIMON/Lingua-EN-Keywords-2.0 - 28 Apr 2003 10:23:29 UTC - Search in distribution

Lingua::EN::StopWordList - A sorted list of English stop words River stage one • 1 direct dependent • 1 total dependent

"Lingua::EN::StopWordList" is a pure Perl module. It returns a sorted arrayref of 659 English stop words....

RSAVAGE/Lingua-EN-StopWordList-1.02 - 16 Aug 2015 04:55:38 UTC - Search in distribution

Plucene::Plugin::Analyzer::SnowballAnalyzer - Stemmed analyzer with Lingua::Stem::Snowball and Lingua::StopWords River stage zero No dependents

Filters StandardTokenizer with SnowballAnalyzer. Change $Plucene::Plugin::Analysis::SnowballAnalyzer::LANG to the language of your choice. (see Lingua::Stem::Snowball documentation for all available languages)....

FABPOT/Plucene-Plugin-Analyzer-SnowballAnalyzer-1.1 - 01 May 2004 09:12:49 UTC - Search in distribution

Acme::CPANModules::Import::RSAVAGE::StopWordLists - CPAN modules which offer stopword lists (2015) River stage zero No dependents

CPAN modules which offer stopword lists (2015). This list is generated by extracting module names mentioned in the article [http://savage.net.au/Perl-modules/html/stopwordlists.report.html] (retrieved on 2016-02-21). For the full article, visit the U...

PERLANCAR/Acme-CPANModulesBundle-Import-RSAVAGE-0.001 - 22 Sep 2018 01:18:00 UTC - Search in distribution
15 results (0.054 seconds)