Bibliome Team
and 1 contributors


bioyatea - Perl script for extracting terms from a corpus of biomedical texts (based on the module Lingua::YaTeA).


bioyatea [-help] [-man] [--rcfile=file] file


--help, -h, -? brief help message
--man, -m full documentation
--rcfile=file load the given configuration file
--extraction perform the term extraction
--post-processing=file, -C file set the filename for the output in case of post-processing
--pre-processing=file, -f file set the filename for the output in case of pret-processing
--post-processing-config=file set the configuration file for the post-processing
file corpus of texts in TreeTagger output format. If only post-processing is set, the file is a YaTeA XML output


BioYaTeA is an adaptation of YaTeA (Lingua::YaTeA) for biomedical text. The tuning concerns the configuration files (in the directory share/BioYaTeA, pre-processing of the input file and post-processing of the XML output.


Using BioYaTeA requires to have a output of TreeTagger (<> or GeniaTagger (<>. It will be the input of BioYaTeA.

To run bioyatea, a configuration file is needed (usually bioyatea.rc in /etc/bioyatea). This file describes the behaviour of the term extractor. You have to indicate the language of the configuration file you use (see section CONFIGURATION FILE FORMAT of Lingua::YaTeA for more details, ). It also indicates the path of the configuration files for the linguistic analysis. You have to adapt the path if your configuration is not standard.

An example of the configuration file is available in etc/bioyatea/bioyatea.rc from the archive directory.

The most common command line to run BioYaTeA is

bioyatea -e TreeTaggerOutputFile.ttg

It is assumed that the directory containing the program bioyatea is in your PATH variable and that the configuration file is /etc/bioyatea/bioyatea.rc.

If you are not allow to copy the configuration file bioyatea.rc in the directory /etc/bioyatea (or create this directory), or if you want to use your own configuration file, you can specify the file with its path by using the option --rcfile

bioyatea -e --rcfile MyBioYaTeAConfig.rc TreeTaggerOutputFile.ttg

More examples of the use of bioyaeta script is given below.


See Documentation in Lingua::YaTeA


Processing of a file without post-processing, with the default configuration file (/etc/bioyatea/bioyatea.rc):

   bioyatea -e sampleEN.ttg

Processing of a file without post-processing. The configuration file is given in the option --rcfile:

   bioyatea -e --rcfile etc/bioyatea.rc sampleEN.ttg

Processing of a file with post-processing:

   bioyatea -e --rcfile etc/bioyatea.rc --post-processing-config etc/post-processing-filtering.conf --post-processing sampleEN-PP.xml sampleEN.ttg

Only post-processing a file (XML YaTeA output format):

   bioyatea --post-processing-config etc/post-processing-filtering.conf --post-processing sampleEN-PP.xml sampleEN-output.xml

Processing of a file with pre-processing:

   bioyatea -e --rcfile etc/bioyatea.rc --pre-processing sampleEN-prepro.ttg sampleEN.ttg

Only pre-processing a file (TreeTagger output format):

   bioyatea --pre-processing sampleEN-prepro.ttg sampleEN.ttg

Processing of a file with pre-processing and post-processing:

   bioyatea -e --rcfile etc/bioyatea.rc --post-processing-config etc/post-processing-filtering.conf --post-processing sampleEN-PP.xml --pre-processing sampleEN-prepro.ttg sampleEN.ttg


Documentation of Lingua::YaTeA


Wiktoria Golik <>, Zorana Ratkovic <>, Robert Bossy <>, Claire Nédellec <>, Thierry Hamon <>


Copyright (C) 2012 Wiktoria Golik, Zorana Ratkovic, Robert Bossy, Claire Nédellec and Thierry Hamon

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.6 or, at your option, any later version of Perl 5 you may have available.