- trainlid - build transition matrix for Lingua::Ident module
- Lingua::Ident - Statistical language identification
Changes for version 1.7
- Tests now work with the data files located at either ../data or data.
- The make test now always generates the data/data.* files--this didn't work on Darwin and MSWin32.
- Added calculate() method, which returns all probabilities. identify () now just calls calculate() and returns the most probable language.
- When neither a trigram nor a bigram is found, use the average alphabet size instead of the individual language's alphabet size, as this penalizes Asian languages.