Lingua::TreeTagger::Filter::Result - store and display the matching sequences.
Version 0.01
use Lingua::TreeTagger::Filter; # Tagging a trial text. my $tagger = Lingua::TreeTagger->new( 'language' => 'english', ); my $text = 'This is a trial'; my $tagged_text = $tagger->tag_text(\$text); # Creating a filter. $filter = Lingua::TreeTagger::Filter->new( 'tag=DT#tag=NN'); # Apply the filter to the taggedtext. $result = $filter->apply($tagged_text); # Display matching sequences as raw text. print($result->as_text()); # Display matching sequences as XML. print($result->as_XML());
This module is part of the Lingua::TreeTagger::Filter distribution. It defines a class to store the matching sequences. It also handles the display and extraction of result. See also Lingua::TreeTagger::Filter, Lingua::TreeTagger::Filter::Result and Lingua::TreeTagger::Filter::Result::Hit
new()
This constructor is normally called by the method apply of the module Lingua::TreeTagger::Filter and not directly by the user The constructor has two required parameters.
hits
a reference to an array containing Lingua::TreeTagger::Filter::Result::Hit object
taggedtext
a Lingua::TreeTagger::Taggedtext object. It is the text on which the filter was applied.
as_text()
# Outputs the matching tokens sequences in standard TreeTagger format. print $tagged_text->as_text(); # Custom formatting. print $tagged_text->as_text( { 'fields' => [ qw( lemma original ) ], 'field_delimiter' => q{:}, 'token_delimiter' => q{ }, 'sequence_delimiter' => q{;}, } );
Outputs the matching tokens sequences in a TaggedText object as raw text. The only (optional) argument is a reference to a hash containing the following optional named parameters:
fields
A reference to the list of token attributes to be included in the output, in the requested appearance order. Three such attributes are supported: original (the original word token), tag (the part-of-speech tag), and lemma (the lemma). Inclusion of other attributes (or attributes not present in the TaggedText because they are not part of the output of the creator TreeTagger object) raises a fatal exception. The value of this parameter defaults to [ qw( original tag lemma ) ], which corresponds to the standard output of TreeTagger.
original
tag
lemma
[ qw( original tag lemma ) ]
field_delimiter
The string that will be inserted between token attributes. Defaults to "\t", which corresponds to the standard output of TreeTagger.
"\t"
token_delimiter
The string that will be inserted between tokens. Defaults to "\n", which corresponds to the standard output of TreeTagger.
"\n"
sequence_delimiter
The string that will be inserted between matching sequences. Defaults to "\n"
as_XML()
# Outputs the matching tokens sequences in XML format. print $tagged_text->as_XML(); # Custom XML formatting #(e.g. C<foo_bis><foo bar="men" baz="man">NN</foo></foo_bis>). print $tagged_text->as_XML( { 'element' => 'foo', 'sequence' => 'foo_bis', 'attributes' => { 'original' => 'bar', 'lemma' => 'baz', }, 'content' => 'tag', } ),
Outputs the matching tokens sequences in a TaggedText object as a list of XML tags, with one tag per line. The only (optional) argument is a reference to a hash containing the following optional named parameters:
element
The string that will be used as the name of the XML tag corresponding to a token. Defaults to 'w'.
'w'
sequence
The string that will be used as the name of the XML tag corresponding to a sequence. Defaults to 'seq'.
'seq'
attributes
A reference to a hash where (i) each key is a token attribute to be included in the output as an XML attribute and (ii) each value is the desired name for this XML attribute. As with method as_text(), three token attributes are supported: original (the original word token), tag (the part-of-speech tag), and lemma (the lemma). Inclusion of other token attributes (or attributes not present in the TaggedText because they are not part of the output of the creator TreeTagger object) raises a fatal exception. The value of this parameter defaults to { 'lemma' => 'lemma', 'tag' => 'type' }.
{ 'lemma' => 'lemma', 'tag' => 'type' }
content
A string specifying the token attribute that will be used as the content of the XML element. Defaults to 'original'.
'original'
add_element()
adds an element to the sequences. This method is normaly called by the apply method from the class Filter
begin_index
an Int corresponding to the index of the beginning from the matching sequence in the taggedtext sequence (an array, attribute 'sequence' from the taggedtext object)
sequence_length
an Int corresponding to the number of tokens composing the matching sequence
get_hits()
Read-only accessor for the 'get_hits' attribute
get_taggedtext()
Read-only accessor for the 'get_taggedtext' attribute
This exception is raised when method as_text() is called with a reference to an empty list as value for parameter 'field'.
This exception is raised when method as_XML() is called with a value for parameter 'attributes' such that one or more attributes are associated with an empty string.
This exception is raised when method as_XML() is called with an empty string as value for the 'element' parameter.
This exception is raised when the 'fields' parameter of method as_text() or the 'attributes' or 'content' parameters of method as_XML() specify one or more token attributes that are not available for this TaggedText object (because they were not part of the creator TreeTagger object's output).
This is part of the Lingua::TreeTagger::Filter distribution. It is not intended to be used as an independent module.
This module requires module Moose and was developed using version 1.09. Please report incompatibilities with earlier versions to the author.
There are no known bugs in this module.
Please report problems to Benjamin Gay (Benjamin.Gay@unil.ch)
Patches are welcome.
Benjamin Gay, <Benjamin.Gay at unil.ch>
<Benjamin.Gay at unil.ch>
Copyright 2011 Benjamin Gay.
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See http://dev.perl.org/licenses/ for more information.
To install Lingua::TreeTagger::Filter, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Lingua::TreeTagger::Filter
CPAN shell
perl -MCPAN -e shell install Lingua::TreeTagger::Filter
For more information on module installation, please visit the detailed CPAN module installation guide.