NAME
Lingua::TreeTagger::Filter::Result - store and display the matching sequences.
VERSION
Version 0.01
SYNOPSIS
use Lingua::TreeTagger::Filter;
# Tagging a trial text.
my $tagger = Lingua::TreeTagger->new(
'language' => 'english',
);
my $text = 'This is a trial';
my $tagged_text = $tagger->tag_text(\$text);
# Creating a filter.
$filter = Lingua::TreeTagger::Filter->new( 'tag=DT#tag=NN');
# Apply the filter to the taggedtext.
$result = $filter->apply($tagged_text);
# Display matching sequences as raw text.
print($result->as_text());
# Display matching sequences as XML.
print($result->as_XML());
Description
This module is part of the Lingua::TreeTagger::Filter distribution. It defines a class to store the matching sequences. It also handles the display and extraction of result. See also Lingua::TreeTagger::Filter, Lingua::TreeTagger::Filter::Result and Lingua::TreeTagger::Filter::Result::Hit
METHODS
new()
-
This constructor is normally called by the method apply of the module Lingua::TreeTagger::Filter and not directly by the user The constructor has two required parameters.
hits
-
a reference to an array containing Lingua::TreeTagger::Filter::Result::Hit object
taggedtext
-
a Lingua::TreeTagger::Taggedtext object. It is the text on which the filter was applied.
as_text()
-
# Outputs the matching tokens sequences in standard TreeTagger format. print $tagged_text->as_text(); # Custom formatting. print $tagged_text->as_text( { 'fields' => [ qw( lemma original ) ], 'field_delimiter' => q{:}, 'token_delimiter' => q{ }, 'sequence_delimiter' => q{;}, } );
Outputs the matching tokens sequences in a TaggedText object as raw text. The only (optional) argument is a reference to a hash containing the following optional named parameters:
fields
-
A reference to the list of token attributes to be included in the output, in the requested appearance order. Three such attributes are supported:
original
(the original word token),tag
(the part-of-speech tag), andlemma
(the lemma). Inclusion of other attributes (or attributes not present in the TaggedText because they are not part of the output of the creator TreeTagger object) raises a fatal exception. The value of this parameter defaults to[ qw( original tag lemma ) ]
, which corresponds to the standard output of TreeTagger. field_delimiter
-
The string that will be inserted between token attributes. Defaults to
"\t"
, which corresponds to the standard output of TreeTagger. token_delimiter
-
The string that will be inserted between tokens. Defaults to
"\n"
, which corresponds to the standard output of TreeTagger. sequence_delimiter
-
The string that will be inserted between matching sequences. Defaults to
"\n"
as_XML()
-
# Outputs the matching tokens sequences in XML format. print $tagged_text->as_XML(); # Custom XML formatting #(e.g. C<foo_bis><foo bar="men" baz="man">NN</foo></foo_bis>). print $tagged_text->as_XML( { 'element' => 'foo', 'sequence' => 'foo_bis', 'attributes' => { 'original' => 'bar', 'lemma' => 'baz', }, 'content' => 'tag', } ),
Outputs the matching tokens sequences in a TaggedText object as a list of XML tags, with one tag per line. The only (optional) argument is a reference to a hash containing the following optional named parameters:
element
-
The string that will be used as the name of the XML tag corresponding to a token. Defaults to
'w'
. sequence
-
The string that will be used as the name of the XML tag corresponding to a sequence. Defaults to
'seq'
. attributes
-
A reference to a hash where (i) each key is a token attribute to be included in the output as an XML attribute and (ii) each value is the desired name for this XML attribute. As with method
as_text()
, three token attributes are supported:original
(the original word token),tag
(the part-of-speech tag), andlemma
(the lemma). Inclusion of other token attributes (or attributes not present in the TaggedText because they are not part of the output of the creator TreeTagger object) raises a fatal exception. The value of this parameter defaults to{ 'lemma' => 'lemma', 'tag' => 'type' }
. content
-
A string specifying the token attribute that will be used as the content of the XML element. Defaults to
'original'
.
add_element()
-
adds an element to the sequences. This method is normaly called by the apply method from the class Filter
begin_index
-
an Int corresponding to the index of the beginning from the matching sequence in the taggedtext sequence (an array, attribute 'sequence' from the taggedtext object)
sequence_length
-
an Int corresponding to the number of tokens composing the matching sequence
ACCESSORS
get_hits()
-
Read-only accessor for the 'get_hits' attribute
get_taggedtext()
-
Read-only accessor for the 'get_taggedtext' attribute
DIAGNOSTICS
- Attempt to call as_text with empty 'field' parameter
-
This exception is raised when method as_text() is called with a reference to an empty list as value for parameter 'field'.
- Empty attribute names are not allowed
-
This exception is raised when method as_XML() is called with a value for parameter 'attributes' such that one or more attributes are associated with an empty string.
- Attempt to call as_XML with empty 'element' parameter
-
This exception is raised when method as_XML() is called with an empty string as value for the 'element' parameter.
-
This exception is raised when the 'fields' parameter of method as_text() or the 'attributes' or 'content' parameters of method as_XML() specify one or more token attributes that are not available for this TaggedText object (because they were not part of the creator TreeTagger object's output).
DEPENDENCIES
This is part of the Lingua::TreeTagger::Filter distribution. It is not intended to be used as an independent module.
This module requires module Moose and was developed using version 1.09. Please report incompatibilities with earlier versions to the author.
BUGS AND LIMITATIONS
There are no known bugs in this module.
Please report problems to Benjamin Gay (Benjamin.Gay@unil.ch)
Patches are welcome.
AUTHOR
Benjamin Gay, <Benjamin.Gay at unil.ch>
LICENSE AND COPYRIGHT
Copyright 2011 Benjamin Gay.
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See http://dev.perl.org/licenses/ for more information.