WordNet::BestStem -- get the best guess stem of a word.
0.2.2
my $best = best_stem( 'roses', {V=>1} );
Based on the assumption that the stem has the highest occurence frequency in text corpus. Of course it is not always true, but for certain purposes it may be justifiable to treat the most frequent form as stem.
Find a word's variant forms. Returns the highest frequency (part-of-speech) form according to ICFinder's "information content file", which comes by default with WordNet but can be customized.
ICFinder has frequency count for n and v part-of-speech and not a or r. When a or r is involved, use the number of senses for part-of-speech intead of fre of wp to choose form.
Alternatively, best_stem can use a custom word variant frequency table.
Returns in list context the best guess stem form, part-of-speech, and frequency; returns in scalar context the stem form.
*Note: WordNet does not at the moment have variant forms for very high frequency words, like "what", "the", "would". best_stem returns empty string in such cases.
Default options (case insensitive):
V => 0, # verbose. for debugging / checking FRE => undef, # % ref to custom word variant frequency table
Usage:
use WordNet::BestStem qw( best_stem ); print best_stem('misgivings'); # misgiving n 8 print best_stem('roses'); # rose n 5 print best_stem('rose'); # rise v 17
Compared to WordNet::stem,
use WordNet::QueryData; use WordNet::stem; $WN = WordNet::QueryData->new(); $stemmer = WordNet::stem->new($WN) print $stemmer->stemWord('misgivings') # misgiving print $stemmer->stemWord('roses') # rose print $stemmer->stemWord('rose') # rose rise
Compared to Lingua::Stem::En,
use Lingua::Stem::En qw( stem ); $stems = stem( { -words => ['misgivings'] } ); print @$stems; # misgiv $stems = stem( { -words => ['roses'] } ); print @$stems; # rose $stems = stem( { -words => ['rose'] } ); print @$stems; # rose
Uses contextual info, ie appearances of word forms in paragraph/corpus to help choose stem form.
V => 0, FRE => undef, # % ref to custom word variant frequency table STEM => undef, # % ref to stem_of{string} table per best_stem
use WordNet::BestStem qw( deluxe_stems ); my $stemmed_text = deluxe_stems \@text;
or in list context
# ref to @, %, %, % my ($stemmed, $stem_of, $stem_fre, $str_fre) = deluxe_stems \@paragraph;
For two paragraphs / sentences,
a) beautiful roses i would like a long stem rose b) he thinks that average salary rose in the last few years
deluxe_stems,
$a_ = deluxe_stems \@a; print @$a_; # beautiful rose i would like a long stem rose # he think that average salary rise in the last few year
Compared to best_stem,
@a_ = map { scalar( best_stem $_ ) || $_ } @a; print "@a_\n"; # beautiful rose i would like a long stem rise # he think that average salary rise in the last few year
WordNet ( http://wordnet.princeton.edu ) WordNet::QueryData WordNet::Similarity::ICFinder
~~~~~~~~~~~~ ~~~~~ ~~~~~~~~ ~~~~~ ~~~ `` ><(((">
Copyright (C) 2009 Maggie J. Xiong < maggiexyz users.sourceforge.net >
All rights reserved. There is no warranty. You are allowed to redistribute this software / documentation as Perl itself.
To install WordNet::BestStem, copy and paste the appropriate command in to your terminal.
cpanm
cpanm WordNet::BestStem
CPAN shell
perl -MCPAN -e shell install WordNet::BestStem
For more information on module installation, please visit the detailed CPAN module installation guide.