- BUG : (line 690) if the output si TERMANDROOTHEAD and termlist is not set to "multi", term head of single term are tried to be printed.
- Correcting the Input (TreeTagger Output) to be always conformed (use a method defined in YaTeA - todo)
- Add POD directive =encoding in each file
- Delete the Options_EN and Options_FR.
Replace it by the loading of the file share/YaTeA/yatea.conf
(function LoadOptions)
use of Config::General for config management
- in ParsingPatterns_EN, take into account to the tag NP (?)
- Add a prefix while making RESULTS directory
- Other method to load config data (rather than change ROOT at the
installation step) :
sub openunicode {
my ($rfh, @path) = @_;
my $f;
unless (defined $$rfh) {
for my $d (@INC) {
use File::Spec;
$f = File::Spec->catfile($d, "unicore", @path);
last if open($$rfh, $f);
undef $f;
}
croak __PACKAGE__, ": failed to find ",
File::Spec->catfile(@path), " in @INC"
unless defined $f;
}
return $f;
}
- Move bin/.config information in yatea.conf (?)
- Write a more explicit message when corpus file does not exist
- Move the Option MESSAGE_DISPLAY in the yatea.rc
- Decription of the terminology format
- rewrite the method
Lingua::YaTeA::ForbiddenStructureAny::setSplitAfter and its calls
Check the XML conformity, especially < and > and &
- Check the computing of the offset. it is assumed that every token
are preceded and follow by a space character. But it is not always
the case (e.g. commas)
- Check that special XML characters are well encoded as XML entities
- Homogenize the output of the file and file handler (print / debug -> file handler, printUnparseable -> file name)
- remove the dtd line in the output (to check ?)
- UTF-8
- tests : chunking
progressively parsing
displaying results
process French corpus (tagged with TreeTagger)
process French corpus (tagged with Flemm)
- to check that Edge.pm is useful
- review WordFromcorpus and LexiconItem (the word form is concated and then split !)
- integrate message display in submodule (for instance TestifiedTermSet)
- add HYPH in the list of syntactic borders
- check and correct forbidden structures
order of the colums:
sein\LF de\LF ANY delete 2
NOM\POS DET:ART\POS NOM\POS ANY split 3 DET:ART
du\LF à\LF un\LF ANY split 3 du
au\LF sein\LF de\LF ANY split 3 sein
ADJ\POS PRP\POS ADJ\POS ANY split 3 PRP
PRP\LF NUM\POS aide\LF PRP\POS ANY split 4 aide
and rules:
à\LF DET:ART\POS aide\LF de\LF ANY delete 2 aide
- add an output with the term/head/modifier in a tabular file
- change Parse::Yapp for Parse::Eyapp