This a module in the DESIRE automatic classification system. Copyright 1999.
Exported routines: 1. Fetching text: These routines all extract texts from a document (either a Combine record, a Combine XWI datastructure or a WWW-page identified by a URL. They all return: $meta, $head, $text, $url, $title, $size $meta: Metadata from document $head: Important text from document $text: Plain text from document $url: URL of the document $title: HTML title of the document $size: The size of the document
Common input parameters: $DoStem: 1=do stemming; 0=no stemming $stoplist: object pointer to a LoadTermList object with a stoplist loaded $simple: 1=do simple loading; 0=advanced loading (might induce errors) getTextXWI parameters: $xwi, $DoStem, $stoplist, $simple $xwi is a Combine XWI datastructure getTextURL parameters: $url, $DoStem, $stoplist, $simple $url is the URL for the page to extract text from
2. Term matcher accepts a text as a (reference) parameter, matches each term in Term against text Matches are recorded in an associative array with class as key and summed weight as value. Match parameters: $text, $termlist $text: text to match against the termlist $termlist: object pointer to a LoadTermList object with a termlist loaded output: %score: an associative array with classifications as keys and scores as values
3. Heuristics: sum scores down the classification tree to the leafs cleanEiTree parameters: %res - an associative array from Match output: %res - same array
Anders Ardö, <firstname.lastname@example.org>
COPYRIGHT AND LICENSE
Copyright (C) 2005,2006 Anders Ardö
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available.
See the file LICENCE included in the distribution at http://combine.it.lth.se/
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 483:
Non-ASCII character seen before =encoding in 'Ardö,'. Assuming ISO8859-1