The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Treex::Core::Run + treex - applying Treex blocks and/or scenarios on data

VERSION

version 0.06903_1

SYNOPSIS

In bash:

 > treex myscenario.scen -- data/*.treex
 > treex My::Block1 My::Block2 -- data/*.treex

In Perl:

 use Treex::Core::Run q(treex);
 treex([qw(myscenario.scen -- data/*.treex)]);
 treex([qw(My::Block1 My::Block2 -- data/*.treex)]);

DESCRIPTION

Treex::Core::Run allows to apply a block, a scenario, or their mixture on a set of data files. It is designed to be used primarily from bash command line, using a thin front-end script called treex. However, the same list of arguments can be passed by an array reference to the function treex() imported from Treex::Core::Run.

Note that this module supports distributed processing, simply by adding switch -p. Then there are two ways to process the data in a parallel fashion. By default, SGE cluster\'s qsub is expected to be available. If you have no cluster but want to make the computation parallelized at least on a multicore machine, add the --local switch.

SUBROUTINES

treex

create new runner and runs scenario given in parameters

USAGE

 usage: treex [-?dEegjLlpqSsv] [long options...] scenario [-- treex_files]
 scenario is a sequence of blocks or *.scen files
 options:
        -? --usage --help            Prints this usage information.
        -s --save                    save all documents
        -q --quiet                   Warning, info and debug messages are
                                     suppressed. Only fatal errors are
                                     reported.
        --cleanup                    Delete all temporary files.
        -e --error_level             Possible values: ALL, DEBUG, INFO, WARN,
                                     FATAL
        -E --forward_error_level     messages with this level or higher will
                                     be forwarded from the distributed jobs
                                     to the main STDERR
        -L --language --lang         shortcut for adding "Util::SetGlobal
                                     language=xy" at the beginning of the
                                     scenario
        -S --selector                shortcut for adding "Util::SetGlobal
                                     selector=xy" at the beginning of the
                                     scenario
        -l --filelist                TODO load a list of treex files from a
                                     file
        -g --glob                    Input file mask whose expansion is to
                                     Perl, e.g. --glob '*.treex'
        -p --parallel                Parallelize the task on SGE cluster
                                     (using qsub).
        -j --jobs                    Number of jobs for parallelization,
                                     default 10. Requires -p.
        --jobindex                   Not to be used manually. If number of
                                     jobs is set to J and modulo set to M,
                                     only I-th files fulfilling I mod J == M
                                     are processed.
        --outdir                     Not to be used manually. Dictory for
                                     collecting standard and error outputs in
                                     parallelized processing.
        --qsub                       Additional parameters passed to qsub.
                                     Requires -p.
        --local                      Run jobs locally (might help with
                                     multi-core machines). Requires -p.
        --watch                      re-run when the given file is changed
                                     TODO better doc
        --workdir                    working directory for temporary files in
                                     parallelized processing (if not
                                     specified, directories such as
                                     001-cluster-run, 002-cluster-run etc.
                                     are created)
        -d --dump_scenario           Just dump (print to STDOUT) the given
                                     scenario and exit.
        --survive                    Continue collecting jobs' outputs even
                                     if some of them crashed (risky, use with
                                     care!).
        -v --version                 Print treex and perl version

AUTHOR

Zdeněk Žabokrtský <zabokrtsky@ufal.mff.cuni.cz>

Martin Popel <popel@ufal.mff.cuni.cz>

COPYRIGHT AND LICENSE

Copyright © 2011 by Institute of Formal and Applied Linguistics, Charles University in Prague

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.