NAME CHANGES Changelog for the Ngram Statistics Package (Text-NSP) SYNOPSIS Revision history for Perl module Text::NSP DESCRIPTION 1.25 Release Jan 15, 2012 all changes by BTM * Added tscore for 3D and 4D along with test cases * Updated rank.pl to work with ties * Added --N option to rank.pl to return the number of ngrams being used to calculate the correlation. 1.23 Release March 31, 2011 all changes by YL * Changed printf to print in huge-split.pl, huge-sort.pl, huge-merge.pl, and count2huge.pl. Replaced the tail hash of huge-merge.pl by without use hash. 1.21 Released November 12, 2010 all changes by BTM * Added the Log Likelihood Measure for 4-grams >>>>>>> 1.24 =item 1.19 Released November 1, 2010 all changes by YL * Created find-compounds.pl and its testing files. find-compounds.pl helps to pick out the compound words in the text file. 1.17 Released April 26, 2010 all changes by YL * Created count2huge.pl and its testing files. count2huge.pl helps to convert the output of count.pl to huge-count.pl. 1.15 Released April 7, 2010 all changes by YL * Created huge-split.pl and huge-delete.pl in order to remove this functionality from huge-count.pl and huge-merge.pl (and make it easier to use these different components in more flexible ways). 1.13 Released March 5, 2010 all changes by TDP and YL * Replaced huge-count.pl with a more efficient version that counts large number of bigrams by creating multiple files, sorting, and merging them. The sorting and merging are carried out by huge-sort and huge-merge.pl. Note that the previous versions of huge-count.pl and associated utilities can be found in /Text-NSP/bin/utils/deprecated and will remain there for at least one more release. They will not however be installed automatically. (YL) * Added --uremove and --ufrequency options to count.pl. This allow for frequency cutoffs based on ngrams occuring more than a given number of times (rather than just less than, which is what --remove and --frequency enable). This is a long standing item on the NSP Todo list that has finally been checked off! (YL) * Introduced /bin/utils/contributed to allow for the distribution of user contributed programs that might be useful to other users. These programs do not get installed automatically with NSP, and are not included in our standard testing streams, but could still prove very useful to users. Please let us know if you have code you might like to include here. (TDP) * Added nsp-stoplist.regex to distribution (in /Text-NSP/bin/utils), to serve as a default stoplist. (TDP) Reported here : <L http://tech.groups.yahoo.com/group/ngram/message/280> This was not added in 1.11 due to failure to rebuilt MANIFEST. * Added support for 4-d log-likelihood (Text::NSP::Measures::4D::MI:ll). (TDP) 1.11 Released Nov 5, 2009 all changes by TDP * Fixed bug in statistic.pl which caused long form of pmi (Text::NSP::Measures::3D::MI::pmi) not to be handled correctly on the command line, and that caused pmi_exp not to be properly initialize when using the long form of pmi. Reported here : <L http://tech.groups.yahoo.com/group/ngram/message/240> * Added nsp-stoplist.regex to distribution (in /Text-NSP/bin/utils), to serve as a default stoplist. Reported here : <L http://tech.groups.yahoo.com/group/ngram/message/280> * Fixed link to class diagram in FAQ.pod. Reported here : <L http://tech.groups.yahoo.com/group/ngram/message/230> * Fixed documentation Text::NSP::Measures::3D::MI::pmi to correctly show how we are computing expected values. Reported here : <L http://tech.groups.yahoo.com/group/ngram/message/290> * Fixed a few broken links in README.pod that were discovered while preparng this release 1.09 Released March 26, 2008 all changes by TDP * Spell checked the modules * Relaxed test cases 27, 29 for ll, 20 for x2, and 13 for phi, due to arithmetic differences on 64 bit architectures * Modified Makefile.PL to go back to more standard methods of testing and installation. * Modified structure of /t directory for 'make test'. It appears that the use of subdirectories in /t with test cases might have been causing problems for Windows testing, so we have moved all test files to the top level of /t, and also removed the TEST program so that things are called in a more standard or generic fashion. 1.07 Released March 24, 2008 all changes by TDP * Updated Makefile.PL to no longer require 5.8.5 - have dropped back to 5.6 * Updated FAQ with some explanation of ALL-TESTS.sh * Renamed /docs as /doc to be consistent with other packages * Added descriptive labels in POD in NAME field of .pl programs to provide that info on CPAN display * Fixed duplicate Copyright message bug in documentation of Measures.pm * Removed "help" messages from Makefile.PL execution so as to (hopefully) avoid problems with installations on Windows. * Corrected error in INSTALL instructions - csh ./ALL-TESTS.sh must be performed after 'make install' 1.05 Released March 20, 2008 all changes by TDP * Fixed problem with file Testing/statistic/t2 would appear (mysteriously) but not be in the MANIFEST. This file was left behind during /Testing/statistic/normal-op.sh and is now being removed. * Fixed problem in /Testing where .sh files are sometimes not executable. Those files are now invoked via 'csh test.sh' rather than './test.sh', meaning that they no longer need to be executable. * Fixed ticket number 24061 from rt.cpan.org regarding incorrect version information coming from Measures.pm * Archived all old ChangeLogs to doc/ChangeLogs directory. Began to use pod in CHANGES directory instead * Added doc/update-pod.sh to automatically refresh top level read only documentation including README, CHANGES, TODO and INSTALL * Fixed Makefile.PL to avoid problems during Windows install. This problem and fix was reported by Richard Churchill to the ngram mailing list. This may also address ticket #20371 from rt.cpan.org. * Modified Makefile.PL to allow for use of 'make dist' and also creation of META.yml BUGS There is a limitation in huge-count.pl. When the size of the corpus is very large (>16G) and the some of the terms of the bigrams is very long (>30 chars), the program could run out of memory at huge-merge.pl step. This is because huge-merge use two hashes to count the frequencies of the first and second term of the bigrams. These two hashes could use up the memory with the increase of the length of the terms and the increase of the number of the terms. If just for normal text, terms are within limited length and numbers, the software won't use up the memory. AUTHORS Ying Liu, University of Minnesota, Twin Cities liux0395 at umn.edu Ted Pedersen, University of Minnesota, Duluth tpederse at d.umn.edu This document last modified by : $Id: CHANGES.pod,v 1.32 2012/01/15 14:10:31 btmcinnes Exp $ SEE ALSO <http://ngram.sourceforge.net> COPYRIGHT AND LICENSE Copyright (c) 2004-2011 Ted Pedersen Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. Note: a copy of the GNU Free Documentation License is available on the web at <http://www.gnu.org/copyleft/fdl.html> and is included in this distribution as FDL.txt. Ted Pedersen, University of Minnesota, Duluth tpederse at d.umn.edu This document last modified by : $Id: CHANGES.pod,v 1.32 2012/01/15 14:10:31 btmcinnes Exp $ SEE ALSO <http://ngram.sourceforge.net> COPYRIGHT AND LICENSE Copyright (c) 2004-2011 Ted Pedersen Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. Note: a copy of the GNU Free Documentation License is available on the web at <http://www.gnu.org/copyleft/fdl.html> and is included in this distribution as FDL.txt.