The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

VERSION

version 0.11949 Treex::Tool::Parser::MSTperl::Labeller - pure Perl implementation of a dependency tree labeller for the MST parser

DESCRIPTION

This is a Perl implementation of the labeller for MST Parser which is (most probably) described in McDonald, Ryan: Discriminative Learning And Spanning Tree Algorithms For Dependency Parsing, 2006 (chapter 3.3.3 Two-Stage Labelling).

For a dependency parse tree - presumably, but not necessarily, obtained using the MST parser (Treex::Tool::Parser::MSTperl::Parser), possibly non-projective - assigns the most probable labels to the edges of the tree, using the given model (Treex::Tool::Parser::MSTperl::ModelLabelling).

Assigning labels is implemented as sequence labelling, where the sequence to be labelled is each sequence of edges between a parent node and its children. First-order Markov factorization is used, but because we do the labelling as a separate second stage to the parsing, many non-local features can be used, exploiting the knowledge of the structure of the whole tree; also we go from to root downwards and are therefore able to use the knowledge of labels assigned to ancestor edges (especially the edge leading to the common parent node).

Each label is technically stored with the child node of the edge it belongs to, so sometimes I will talk about a node label, which actually means the label of the edge between the node and its parent.

A variant of the Viterbi algorithm is used to find the best scoring sequence of labels. However, instead of probabilities we use (real-valued) scores and therefore instead of multiplication addition is used in Viterbi.

For detail on the features and training see Treex::Tool::Parser::MSTperl::TrainerLabelling.

I have used several sources of information to implement it, especially:

Kevin Gimpel and Shay Cohen: Discriminative Online Algorithms for Sequence Labeling - A Comparative Study (2007)

And also parts of these:

Jun’ichi Kazama and Kentaro Torisawa: A New Perceptron Algorithm for Sequence Labeling with Non-local Features (2007)

Wenliang Chen, Yujie Zhang, Hitoshi Isahara: A Two-stage Parser for Multilingual Dependency Parsing (2007)

Binyamin Rozenfeld, Ronen Feldman, Moshe Fresko: A Systematic Cross-Comparison of Sequence Classifiers (2004?)

FIELDS

config

Instance of Treex::Tool::Parser::MSTperl::Config containing settings to be used with labeller.

Currently the settings most relevant to the Labeller are the following:

VITERBI_STATES_NUM_THRESHOLD

See "VITERBI_STATES_NUM_THRESHOLD" in Treex::Tool::Parser::MSTperl::Config.

labeller_algorithm

See "labeller_algorithm" in Treex::Tool::Parser::MSTperl::Config.

labelledFeaturesControl

See "labelledFeaturesControl" in Treex::Tool::Parser::MSTperl::Config.

SEQUENCE_BOUNDARY_LABEL

See "SEQUENCE_BOUNDARY_LABEL" in Treex::Tool::Parser::MSTperl::Config.

model

An instance of Treex::Tool::Parser::MSTperl::ModelLabelling used to label the trees. It can be changed if needed (it is usually needed when training the labeller).

METHODS

my $labeller = Treex::Tool::Parser::MSTperl::Labeller->new( config => $self->config);

Creates an instance of the labeller using the given configuration, also initializing the model.

$labeller->load_model('modelfile.lmodel');
$labeller->load_model_tsv('modelfile.tsv');

Loads a labelled model using "load" in Treex::Tool::Parser::MSTperl::ModelBase or "load_tsv" in Treex::Tool::Parser::MSTperl::ModelBase.

A model has to be loaded or created in an other way before sentences can be labelled.

$labeller->label_sentence($sentence);

Labels a sentence (instance of Treex::Tool::Parser::MSTperl::Sentence). It finds the best labels for the sentence and returns them as an array reference.

The given sentence is left intact and any labelling information already contained in the sentence is disregarded.

$labeller->label_sentence_internal($sentence);

Does the actual labelling, returning a labelled instance of Treex::Tool::Parser::MSTperl::Sentence. The label_sentence sub is actually only a wrapper for this method which extracts the labels of the nodes and returns these. If you are only interested in getting the labels, use label_sentence, but if it is handy for you to get a whole instance of Treex::Tool::Parser::MSTperl::Sentence (which is labelled), use label_sentence_internal. The subroutines are otherwise equivalent.

The given sentence is left intact and any labelling information already contained in the sentence is disregarded. All other information from the given sentence is copied to the returned sentence, the sentence only differ in the labels assigned.

$labeller->label_subtree($parent);

Assign labels to edges going from the $parent node to its children (and recurse on the children). Directly modifies the sentence (or more precisely: the nodes within the sentence).

$labeller->label_edge($edge, $states);

Used as an internal part of label_subtree to get all probable labels for an edge $edge i.e. make one step of the Viterbi algorithm. The $states is a hash reference of all currently active Viterbi states.

$labeller->prune($states);

Prune the states (passed as a hash ref), currently does n-best pruning with n = config-VITERBI_STATES_NUM_THRESHOLD>, keeping always n best scoring states at maximum (all states if there are less than n). Is called after each Viterbi step, i.e. at the end of each call of label_edge. Returns the new pruned states (does not modify its input).

$labeller->get_possible_labels($edge, $previous_label, $previous_label_score);

Computes possible labels for an edge, using info about the emission scores, transition scores and last state's score. (Because of the first order Markov factorization used, the states correspond to labels assigned to the edges they corresond to.)

$labeller->get_possible_labels_internal ($emission_scores, $transition_scores, $last_state_score);

Does the actual generation of possible labels for an edge, where the emission and transition scores are already computed for this particular edge.

AUTHORS

Rudolf Rosa <rosa@ufal.mff.cuni.cz>

COPYRIGHT AND LICENSE

Copyright © 2011 by Institute of Formal and Applied Linguistics, Charles University in Prague

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.