The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

ME.wrapper.pl - a wrapper around Statistics::MaxEntropy and Statistics::Candidates

SYNOPSIS

 ME.wrapper.pl --help
               --debug
               --i_events <filename>
               --i_candidates <filename>
               --i_dump <filename>
               --o_events <filename>
               --o_candidates <filename>
               --o_parameters <filename>
               --special <filename>
               --o_dump <filename>
               --integer
               --KL_max_it <integer>
               --NEWTON_max_it <integer>
               --KL_min <float>
               --NEWTON_min <float>
               --nr_to_add <integer>
               --SAMPLE <integer>
               --GIS
               --IIS
               --MC
               --CORPUS
               --ENUM

DESCRIPTION

ME.wrapper.pl is a command-line interface to Statistics::MaxEntropy and Statistics::Candidates. The wrapper and its command line options provide an easy-to-use and transparent connection to the MaxEntropy modules. Below we explain the meaning of the options.

COMMAND LINE ARGUMENTS

We explain the command line options, and state the at which moment they are applied or executed. For this we assume the main program of ME.wrapper.pl to have the following form

 prologue();
 run();
 epilogue();

If both candidates and events are specified, the feature induction algorithm is called. If only events are specified a scaling algorithm is called (GIS by default).

--integer

Specifies whether the feature functions should be interpreted as binary or integer functions.

--KL_max_it integer

(set in prologue) The maximum number of iterations performed by the scaling algorithms.

--NEWTON_max_it integer

(set in prologue) The maximum number of iteration in Newton's method (IIS only).

--KL_min integer

(set in prologue) The minimum difference in Kullback-Leibler divergence that a new scale iteration should bring. Otherwise Scaling is stopped.

--NEWTON_min float

(set in prologue) The minimum difference between the new x and the old x in Newton's method (IIS only).

--nr_to_add integer

(used in run) Passed to the feature induction algorithm (if called). It states the number of candidates that should be added.

--SAMPLE integer

(used in run) Passed to the feature induction algorithm (if called). It determines the size of the Monte Carlo sample. Only makes sense if --MC is set.

--GIS

(used in run) Sets the scaling algorithm to to Generalised Iterative Scaling.

--IIS

(used in run) Sets the scaling algorithm to Improved Iterative Scaling.

--MC

(used in run) Sets the sampling method to Monte Carlo. See also the --SAMPLE option.

--CORPUS

(used in run) Tells the scaling algorithm to consider the event space a good sample (risky: overtraining).

--ENUM

(used in run) For scaling the complete event space (all bitvectors) should be enumerated. This is done in memory, so beware!

--help

(done in prologue) Exits after showing the name of the program, and the list of command line options.

--debug

(set in prologue) Tells the MaxEntropy and Candidates modules to output a lot of text.

--i_events filename

(done in prologue) The events are read from <filename>.

--i_candidates filename

(done in prologue) The candidates are read from <filename>.

--i_dump filename

(done in prologue) An event space read from the dump in <filename>. This option overrules --i_events option.

--o_events filename

(done in epilogue) The events (including candidates that were added) are written to filename.

--o_candidates filename

(done in epilogue) The candidates (if present) are written to filename. Only candidates that were not added to the event space are written.

--o_parameters filename

(done in epilogue) The parameters are written to filename.

--special filename

(done in epilogue) The parameters are written to filename in a special format I like.

--o_dump filename

(done in epilogue) The event space is dumped to filename. It can be read in again using --i_dump (the next time you use ME.wrapper.pl).

BUGS

Options --MC, --CORPUS, --ENUM should be put under one argument that has a parameter, for instance --sample_type [corpus, enum, mc].

SEE ALSO

perl(1), Statistics::SparseVector(3) Statistics::Candidates(3), Statistics::MaxEntropy(3).

VERSION

Version 0.2.

AUTHOR

COPYRIGHT

ME.wrapper.pl comes with ABSOLUTELY NO WARRANTY and may be copied only under the terms of the GNU Library General Public License (version 2, or later), which may be found in the distribution.