- SEE ALSO
- COPYRIGHT AND LICENSE
Lingua::YaTeA::TestifiedTerm - Perl extension for Testified Term
use Lingua::YaTeA::TestifiedTerm; Lingua::YaTeA::TestifiedTerm->new(num_content_words,$words_a,$tag_set,$source,$match_type);
The module implements a representation of the testified terms, i.e. terms from a terminological resource. Those testified terms are used to find corresponding terms in the corpus. Each testified term is described by its identifier (
ID), its inflected form
IF, its list of part-of-speech tags
POS, its lemma
LF, the terminological source
SOURCE, the list of word components
WORDS, the regular expression used to identify it in the corpus (
REG_EXP), the indication whether the testified term is found or not (
FOUND), its list of occurrences
OCCURRENCES and the list of the word index entries (
The three information
LF are computed from the information issued from their word components.
This method creates a new object representing a testified term. It sets the fields
$tag_set are used to initialise the lignuistic information (
$source initialises the
$mach_type defines the type of matching for finding the terms in the corpus.
This method checks if all the words of a testified term appear in the lexicon of the text (
$filtering_lexicon_h) according to the matching type
loose (each word matches either a inflected form or a lemmatised form)
strict (each word matches a inflected form with the correct Part-of-Speech tag)
default (each word mathces a inflected form). The method returns 1 if all the words of the testified term are found in the lexicon, otherwise it returns 0.
$filtering_lexicon_h is a hash table containing the inflected forms, the lemmatised form and the concatenation of the inflected form and the Partof-speech tag (separated by a
~ character) of each word in the text.
The method returns the inflected form, the postag list and the lemma of the term candidate as an array (each informationn is the concatenation of the word information found in the array
$words and the Part-of-Speech tags
The mathod returns the list of the words that are components of the term candidate.
The method sets the inflected form of the term candidate.
The method sets the list of the part-of-speech tags of the term candidate.
The method sets the canonical form (lemma) of the term candidate.
The method returns the inflected form of the term candidate.
The method returns the list of the part-of-speech tags of the term candidate.
The method returns the canonical form (lemma) of the term candidate.
This method returns the identifier of the term candidate.
This method builds the key of the testified term, i.e. the concatenation of the inflected form, the postag list and the lemma (separated by the character '~').
The method returns the terminological resource where the testified term is issued.
The method computes the regular expression corresponding to the term according to the type of matching defined by
$mach_type. This regular expression will be used to find the term in the corpus.
The method returns the regular expression corresponding to the testified term (field
The method returns the word at the position
index in the list of the components of the term candidate.
This method looks for the current testified term with the occurrence
hrase_occurrence of the phrase
$phrase (according to the key
$key). And then the occurrence is recorded in the list of occurrences
$fh is the file hanlder of a debugging file.
The method returns the position (start and end offsets) of the phrase
$phrase according to the index array
$fh is the file hanlder of a debugging file.
This method initialises the index set with the number betwwen 0 and
$size (usually the number of words).
This method returns the index set (field
INDEX_SET) of the word components.
This method returns the list of the occurrences of the term candidate, as an array reference.
Sophie Aubin and Thierry Hamon. Improving Term Extraction with Terminological Resources. In Advances in Natural Language Processing (5th International Conference on NLP, FinTAL 2006). pages 380-387. Tapio Salakoski, Filip Ginter, Sampo Pyysalo, Tapio Pahikkala (Eds). August 2006. LNAI 4139.
Thierry Hamon <firstname.lastname@example.org> and Sophie Aubin <email@example.com>
Copyright (C) 2005 by Thierry Hamon and Sophie Aubin
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.6 or, at your option, any later version of Perl 5 you may have available.