Author image Ted Pedersen
and 1 contributors

Documentation

Count the frequency of Ngrams in text
Measure the association of Ngrams in text
Combine frequency counts to determine co-occurrence
Combine two trigram files created by count.pl into single file
Divide huge text into pieces and run huge-count3.pl for 3grams separately on each (and then combine)
Sort output from count.pl or statistic.pl in descending order based on frequency or association score
Convert the output of count.pl to huge-count.pl.
Combine two bigram files created by count.pl into single file
Divide huge text into pieces and run count.pl separately on each (and then combine)
Sort output from count.pl or statistic.pl in descending order based on frequency or association score
Divide a text file in N approximately equal parts
find compound words in a text that are specified in a list.
Count all the bigrams in a huge text without using huge amounts of memory.
Delete bigrams found by huge-count.pl based on low/high frequency.
Merge the results of multiple huge-sort generated files into a single sorted file.
Sort a --tokenlist of bigrams from huge-count.pl in alphabetical order.
Split bigram files from huge-count.pl into pieces.
Find the Kth order co-occurrences of a word
Calculate Spearman's Correlation on two ranked lists output by count.pl or statistic.pl
FAQ
Installation instructions for Text-NSP

Modules

Extract collocations and Ngrams from text
Perl modules for computing association scores of Ngrams. This module provides the basic framework for these measures.
Perl module that provides basic framework for building measure of association for bigrams.
Perl module that provides error checks for the Pearson's chi squared, phi coefficient and the Tscore measures.
Perl module that implements Phi coefficient measure for bigrams.
Perl module that implements T-score measure of association for bigrams.
Perl module that implements Pearson's chi squared measure of association for bigrams.
Perl module that provides the framework to implement the Dice and Jaccard coefficients.
Perl module to compute Dice coefficient for bigrams.
Perl module that implements the jaccard coefficient.
Perl module that provides methods to compute the Fishers exact tests.
Perl module implementation of the left sided Fisher's exact test.
Perl module implementation of the right sided Fisher's exact test.
Perl module implementation of the two-sided Fisher's exact test.
Perl module that provides methods to compute the Fishers exact tests.
Perl module implementation of the left sided Fisher's exact test (Deprecated).
Perl module implementation of the right sided Fisher's exact test (Deprecated).
Perl module implementation of the two-sided Fisher's exact test (Deprecated).
Perl module that provides error checks for Loglikelihood, Total Mutual Information, Pointwise Mutual Information and Poisson-Stirling Measure.
Perl module that implements Loglikelihood measure of association for bigrams.
Perl module that implements Pointwise Mutual Information.
Perl module that implements Poisson-Stirling measure of association for bigrams.
Perl module that implements True Mutual Information.
Perl module to compute the Odds ratio for bigrams.
Perl module that provides basic framework for building measure of association for trigrams.
Perl module that provides error checks and framework to implement Loglikelihood, Total Mutual Information, Pointwise Mutual Information and Poisson Stirling Measure for trigrams.
Perl module that implements Loglikelihood measure of association for trigrams.
Perl module that implements Pointwise Mutual Information for trigrams.
Perl module that implements Poisson Stirling Measure for trigrams.
Perl implementation for True Mutual Information for trigrams.
Perl module that provides basic framework for building measure of association for 4-grams.
Perl module that provides error checks and framework to implement Loglikelihood for 4-grams.
Perl module that implements Loglikelihood measure of association for 4-grams.