++ed by:

3 PAUSE users
1 non-PAUSE user.

Zdeněk Žabokrtský
and 1 contributors


Treex::Core::Run + treex - applying Treex blocks and/or scenarios on data


version 0.08399


In bash:

 > treex myscenario.scen -- data/*.treex
 > treex My::Block1 My::Block2 -- data/*.treex

In Perl:

 use Treex::Core::Run q(treex);
 treex([qw(myscenario.scen -- data/*.treex)]);
 treex([qw(My::Block1 My::Block2 -- data/*.treex)]);


Treex::Core::Run allows to apply a block, a scenario, or their mixture on a set of data files. It is designed to be used primarily from bash command line, using a thin front-end script called treex. However, the same list of arguments can be passed by an array reference to the function treex() imported from Treex::Core::Run.

Note that this module supports distributed processing, simply by adding switch -p. Then there are two ways to process the data in a parallel fashion. By default, SGE cluster\'s qsub is expected to be available. If you have no cluster but want to make the computation parallelized at least on a multicore machine, add the --local switch.



create new runner and runs scenario given in parameters


 usage: treex [-?dEegjLmpqSsv] [long options...] scenario [-- treex_files]
 scenario is a sequence of blocks or *.scen files
        -? --usage --help            Prints this usage information.
        -s --save                    save all documents
        -q --quiet                   Warning, info and debug messages are
                                     suppressed. Only fatal errors are
        --cleanup                    Delete all temporary files.
        -e --error_level             Possible values: ALL, DEBUG, INFO, WARN,
        -E --forward_error_level     messages with this level or higher will
                                     be forwarded from the distributed jobs
                                     to the main STDERR
        -L --language --lang         shortcut for adding "Util::SetGlobal
                                     language=xy" at the beginning of the
        -S --selector                shortcut for adding "Util::SetGlobal
                                     selector=xy" at the beginning of the
        -g --glob                    Input file mask whose expansion is to
                                     Perl, e.g. --glob '*.treex'
        -p --parallel                Parallelize the task on SGE cluster
                                     (using qsub).
        -j --jobs                    Number of jobs for parallelization,
                                     default 10. Requires -p.
        --jobindex                   Not to be used manually. If number of
                                     jobs is set to J and modulo set to M,
                                     only I-th files fulfilling I mod J == M
                                     are processed.
        --outdir                     Not to be used manually. Dictory for
                                     collecting standard and error outputs in
                                     parallelized processing.
        --local                      Run jobs locally (might help with
                                     multi-core machines). Requires -p.
        --priority                   Priority for qsub, an integer in the
                                     range -1023 to 0 (or 1024 for admins),
                                     default=-100. Requires -p.
        --memory -m --mem            How much memory should be allocated for
                                     cluster jobs, default=2G. Requires -p.
                                     Translates to "qsub -hard -l
                                     mem_free=$mem -l act_mem_free=$mem -l
        --qsub                       Additional parameters passed to qsub.
                                     Requires -p. See --priority and --mem.
        --watch                      re-run when the given file is changed
                                     TODO better doc
        --workdir                    working directory for temporary files in
                                     parallelized processing (if not
                                     specified, directories such as
                                     001-cluster-run, 002-cluster-run etc.
                                     are created)
        -d --dump_scenario           Just dump (print to STDOUT) the given
                                     scenario and exit.
        --survive                    Continue collecting jobs' outputs even
                                     if some of them crashed (risky, use with
        -v --version                 Print treex and perl version


Zdeněk Žabokrtský <zabokrtsky@ufal.mff.cuni.cz>

Martin Popel <popel@ufal.mff.cuni.cz>


Copyright © 2011 by Institute of Formal and Applied Linguistics, Charles University in Prague

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.