NAME

Text::WordCounter - counting words in multilingual texts

VERSION

version 0.001

SYNOPSIS

my $counter = Text::WordCounter->new();

my $word_count = $counter->word_count( $text )

DESCRIPTION

It is quite heuristic, for example '-' and digits inside word characters are treated as a word character, see the tests to find out how all the special cases are resolved,

The features parameter should be a hashref and is an accumulator for found features.

ATTRIBUTES

stemming

If set stemming via Lingua::Stem is performed on the words. We never managed to make it sanely in multilingual texts.

stopwords

A hashref with words to discard.

INSTANCE METHODS

`is_stop_word`

`normalize`

Lowercases words and stemms them if the stemming attribute is true.

`split_scripts`

`word_count`

Returns a hashref with word counts.

LIMITATIONS

From languages that don't use spaces only Chinese is currently supported (using Lingua::ZH::MMSEG).

AUTHORS

Zbigniew Lukasiak <zlukasiak@opera.com>
Tadeusz Sośnierz, tsosnierz@opera.com

COPYRIGHT AND LICENSE

This is free software, licensed under:

The Artistic License 2.0 (GPL Compatible)

1 POD Error

The following errors were encountered while parsing the POD:

Around line 172:: Non-ASCII character seen before =encoding in 'Sośnierz,'. Assuming UTF-8

To install Text::WordCounter, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Text::WordCounter

CPAN shell

perl -MCPAN -e shell
install Text::WordCounter

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	Go to GitHub issues (only if GitHub is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

VERSION

SYNOPSIS

DESCRIPTION

ATTRIBUTES

stemming

stopwords

INSTANCE METHODS

is_stop_word

normalize

split_scripts

word_count