NAME
Treex::Tool::Parser::MSTperl::Sentence
VERSION
version 0.11949
DESCRIPTION
Represents a sentence, both parsed an unparsed. Contains an array of nodes which represent the words in the sentence.
The nodes are ordered, their ord is their 1-based position in the sentence. The 0 ord value is reserved for the (technical) sentence root.
FIELDS
- id (Int)
-
An integer id unique for each sentence (in its proper sense, where sentence is a sequence of tokens - i.e.
idstays the same for copies of the same sentence). - nodes (ArrayRef[Treex::Tool::Parser::MSTperl::Node])
-
(A reference to) an array of nodes (
Treex::Tool::Parser::MSTperl::Node) of the sentence.A node represents both a token of the sentence (usually this is a word) and a node in the parse tree of the sentence as well (if the sentence have been parsed).
- nodes_with_root (ArrayRef[Treex::Tool::Parser::MSTperl::Node])
-
Copy of
nodesfield with a root node (Treex::Tool::Parser::MSTperl::RootNode) added at the beginning. As the root node'sordis0by definition, the position of the nodes in this array exactly corresponds to itsord. - edges (Maybe[ArrayRef[Treex::Tool::Parser::MSTperl::Edge]])
-
If the sentence is parsed (i.e. the nodes know their parents), this field contains (a reference to) an array of all edges (Treex::Tool::Parser::MSTperl::Edge) in the parse tree of the sentence.
This field is set by the
subfill_fields_after_parse.If the sentence is not parsed, this field is
undef. - features (Maybe[ArrayRef[Str]])
-
If the sentence is parsed, this field contains (a reference to) an array of all features of all edges in the parse tree of the sentence. If some of the features are repeated in the sentence (i.e. they are present in severeal edges or even repeated in one edge), they are repeated here as well, i.e. this is not a set in mathematical sense but a (generally unordered) list.
This field is set by the
subfill_fields_after_parse.If the sentence is not parsed, this field is
undef.
METHODS
Constructor
- my $sentence = Treex::Tool::Parser::MSTperl::Sentence->new( id => 12, nodes => [$node1, $node2, $node3, ...]);
-
Creates a new sentence. The
idmust be unique (but copies of the same sentence are to share the same id). It is used for edge signature generation ("signature" in Treex::Tool::Parser::MSTperl::Edge) in edge features caching (and therefore does not have to be set if caching is disabled).The order of the nodes denotes their order in the sentence, starting from the node with
ord1, i.e. the technical root (Treex::Tool::Parser::MSTperl::RootNode) is not to be included as it is generated automatically in the constructor. Theords of the nodes ("ord" in Treex::Tool::Parser::MSTperl::Node) do not have to (and actually shouldn't) be filled in. If they are, they are checked and a warning on STDERR is issued if they do not correspond to the position of the nodes in the array. If they are not, they are filled in automatically during the sentence creation.Other fields (
nodes_with_root,edgesandfeatures) should usually not be set.nodes_with_rootare set automatically during sentence creation (and any value set to it is discarded).edgesandfeaturesare to be set only if the sentence is parsed (i.e. the nodes know their parents, see "parent" in Treex::Tool::Parser::MSTperl::Node and "parentOrd" in Treex::Tool::Parser::MSTperl::Node) by calling thefill_fields_after_parsemethod.So, if the sentence is already parsed, you should call the
fill_fields_after_parsemethod immediately after creaion of the sentence. - my $unparsed_sentence_copy = $sentence->copy_nonparsed();
-
Creates a new instance of the same sentence with the same
idand with copies of the nodes but without any parsing information (like after callingclear_parse). The nodes are copied by calling "copy_nonparsed" in Treex::Tool::Parser::MSTperl::Node.
Action methods
- $sentence->setChildParent(5, 3)
-
Sets the parent of the node with the first
ordto be the node with the secondord- eg. here, the 3rd node is the parent of the 5th node. It only sets theparentandparentOrdfields in the child node (i.e. it does not create or modify any edges).When all nodes' parents have been set,
fill_fields_after_parsecan be called. - $sentence->fill_fields_after_parse()
-
Fills the fields of the sentence and fields of its nodes which can be filled only for a sentence that has already been parsed (i.e. if the nodes'
parentorparentOrdfields are filled).The fields which are filled by this subroutine are
edgesandfeaturesfor the sentence andparentorparentOrdfor each of the sentence nodes which do not have the field set. - $sentence->clear_parse()
-
Is kind of an inversion of the
fill_fields_after_parsemethod. It clears theedgesandfeaturesfields and also unsets the parents of all nodes (by setting theirparentfield toundefandparentOrdto0).
Information methods
- $sentence->len()
-
Returns length of the sentence, i.e. number of nodes in the sentence. Each node corresponds to one word (one token to be more precise).
- $sentence->count_errors_attachement($correct_sentence)
-
Compares the parse tree of the sentence with its correct parse tree, represented by an instance of the same sentence containing its correct parse.
An error is considered to be an incorrectly assigned governing node. So, the parents of all nodes (obviously not including the root node) are compared and if they are different, it is counted as an error. This leads to a minimum number of errors equal to 0 and maximum number equal to the length of the sentence.
- $sentence->count_errors_labelling($correct_sentence)
-
Compares the labelling of the sentence with its correct labelling, represented by an instance of the same sentence containing the correct labels.
An error is considered to be an incorrectly assigned label. So, the labels of all edges (technically stored in the child nodes) are compared and if they are different, it is counted as an error. This leads to a minimum number of errors equal to 0 and maximum number equal to the length of the sentence.
- $sentence->getNodeByOrd(6)
-
Returns the node with this
ord(it can also be the root node if theordis 0) orundefif theordis out of range. - $sentence->toString()
-
Returns forms of the nodes joined by spaces (i.e. the sentence as a text but with a space between each two adjacent tokens).
- $sentence->toParentOrdsArray()
-
Returns (a reference to) an array of node parent ords, i.e. for the sentence "Tom is big", where "is" is a child of the root node and "Tom" and "big" are children of "is", this method returns
[2, 0, 2].
AUTHORS
Rudolf Rosa <rosa@ufal.mff.cuni.cz>
COPYRIGHT AND LICENSE
Copyright © 2011 by Institute of Formal and Applied Linguistics, Charles University in Prague
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.