NAME
Text::WordCounter - counting words in multilingual texts
VERSION
version 0.001
SYNOPSIS
my $counter = Text::WordCounter->new();
my $word_count = $counter->word_count( $text )
DESCRIPTION
It is quite heuristic, for example '-' and digits inside word characters are treated as a word character, see the tests to find out how all the special cases are resolved,
The features parameter should be a hashref and is an accumulator for found features.
ATTRIBUTES
stemming
If set stemming via Lingua::Stem is performed on the words. We never managed to make it sanely in multilingual texts.
stopwords
A hashref with words to discard.
INSTANCE METHODS
is_stop_word
normalize
Lowercases words and stemms them if the stemming
attribute is true.
split_scripts
word_count
Returns a hashref with word counts.
LIMITATIONS
From languages that don't use spaces only Chinese is currently supported (using Lingua::ZH::MMSEG).
SEE ALSO
__END__
AUTHORS
Zbigniew Lukasiak <zlukasiak@opera.com>
Tadeusz Sośnierz, tsosnierz@opera.com
COPYRIGHT AND LICENSE
This software is Copyright (c) 2012 by Opera Software ASA.
This is free software, licensed under:
The Artistic License 2.0 (GPL Compatible)
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 172:
Non-ASCII character seen before =encoding in 'Sośnierz,'. Assuming UTF-8