AI::Categorizer::FeatureSelector::ChiSquare - ChiSquare Feature Selection class


 # the recommended way to use this class is to let the KnowledgeSet
 # instanciate it

 use AI::Categorizer::KnowledgeSetSMART;
 my $ksetCHI = new AI::Categorizer::KnowledgeSetSMART(
   tfidf_notation =>'Categorizer',
   feature_selection=>'chi_square', ...other parameters...); 

 # however it is also possible to pass an instance to the KnowledgeSet

 use AI::Categorizer::KnowledgeSet;
 use AI::Categorizer::FeatureSelector::ChiSquare;
 my $ksetCHI = new AI::Categorizer::KnowledgeSet(
   feature_selector => new ChiSquare(features_kept=>2000,verbose=>1),
   ...other parameters...


Feature selection with the ChiSquare function.

  Chi-Square(t,ci) = (N.(AD-CB)^2)

where t = term ci = category i N = number of documents in the collection A = number of times where t and c co-occur B = " " " t occurs without c C = " " " c occurs without t D = " " " neither c nor t occur

for more details, see : Yiming Yang, Jan O. Pedersen, A Comparative Study on Feature Selection in Text Categorization, in Proceedings of ICML-97, 14th International Conference on Machine Learning, 1997. (available on citeseer.nj.nec.com)



Francois Paradis, paradifr@iro.umontreal.ca with inspiration from Ken Williams AI::Categorizer code

