19 Nov 2012 17:17:31 UTC
- Distribution: Text-WordGrams
- Module version: 0.07
- Source (raw)
- Browse (raw)
- How to Contribute
- Issues (0)
- Testers (569 / 1 / 0)
- KwaliteeBus factor: 1
- 83.33% Coverage
- License: unknown
- Activity24 month
- Download (5.92KB)
- MetaCPAN Explorer
- Subscribe to distribution
- This version
- Latest versionAMBS Alberto Simões 🐪
Text::WordGrams - Calculates statistics on word ngrams.
use Text::WordGrams; my $data = word_grams( $text ); my $data = word_grams_from_files( $file1, $file2 );
Returns a reference to an hash table with word ngrams counts for a specified string. Options are passed as a hash reference as first argument if needed.
Set this option to ignore text case;
Set this option to the n-gram size you want. Notice that the value should be greater or equal to two. Also, keep in mind that the bigger size you ask for, the larger the hash will become.
This option is activated by default. Give a zero value if your document is already tokenized. In this case your text will be slitted by space characters.
Supports the same options of
word_gramsfunction, but receives a list of file names instead of a string.
Current method is very, very slow. if you find any faster method, please let me know. I think the bottle neck is in the tokenisation part.
Please report any bugs or feature requests to
firstname.lastname@example.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Text-WordGrams. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
Copyright 2005-2009 Alberto Simões, all rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
Module Install Instructions
To install Text::WordGrams, copy and paste the appropriate command in to your terminal.
perl -MCPAN -e shell install Text::WordGrams
For more information on module installation, please visit the detailed CPAN module installation guide.