The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Lingua::TreeTagger::Filter::Result - store and display the matching sequences.

VERSION

Version 0.01

SYNOPSIS

  use Lingua::TreeTagger::Filter;
  
  
  # Tagging a trial text.
  my $tagger = Lingua::TreeTagger->new(
      'language' => 'english',
  );
  
  my $text   = 'This is a trial';
  my $tagged_text = $tagger->tag_text(\$text);
  
  # Creating a filter.
  $filter = Lingua::TreeTagger::Filter->new( 'tag=DT#tag=NN');
  
  # Apply the filter to the taggedtext.
  $result = $filter->apply($tagged_text);
  
  # Display matching sequences as raw text.
  print($result->as_text());
  
  # Display matching sequences as XML.  
  print($result->as_XML());

Description

This module is part of the Lingua::TreeTagger::Filter distribution. It defines a class to store the matching sequences. It also handles the display and extraction of result. See also Lingua::TreeTagger::Filter, Lingua::TreeTagger::Filter::Result and Lingua::TreeTagger::Filter::Result::Hit

METHODS

new()

This constructor is normally called by the method apply of the module Lingua::TreeTagger::Filter and not directly by the user The constructor has two required parameters.

hits

a reference to an array containing Lingua::TreeTagger::Filter::Result::Hit object

taggedtext

a Lingua::TreeTagger::Taggedtext object. It is the text on which the filter was applied.

as_text()
    # Outputs the matching tokens sequences in standard TreeTagger format.
    print $tagged_text->as_text();

    # Custom formatting.
    print $tagged_text->as_text( {
        'fields'             => [ qw( lemma original ) ],
        'field_delimiter'    => q{:},
        'token_delimiter'    => q{ },
        'sequence_delimiter' => q{;},
    } );

Outputs the matching tokens sequences in a TaggedText object as raw text. The only (optional) argument is a reference to a hash containing the following optional named parameters:

fields

A reference to the list of token attributes to be included in the output, in the requested appearance order. Three such attributes are supported: original (the original word token), tag (the part-of-speech tag), and lemma (the lemma). Inclusion of other attributes (or attributes not present in the TaggedText because they are not part of the output of the creator TreeTagger object) raises a fatal exception. The value of this parameter defaults to [ qw( original tag lemma ) ], which corresponds to the standard output of TreeTagger.

field_delimiter

The string that will be inserted between token attributes. Defaults to "\t", which corresponds to the standard output of TreeTagger.

token_delimiter

The string that will be inserted between tokens. Defaults to "\n", which corresponds to the standard output of TreeTagger.

sequence_delimiter

The string that will be inserted between matching sequences. Defaults to "\n"

as_XML()
    # Outputs the  matching tokens sequences in XML format.
    print $tagged_text->as_XML();

    # Custom XML formatting 
    #(e.g. C<foo_bis><foo bar="men" baz="man">NN</foo></foo_bis>).
    print $tagged_text->as_XML( {
        'element'       => 'foo',
        'sequence'      => 'foo_bis',
        'attributes'    => {
            'original'      => 'bar',
            'lemma'         => 'baz',
        },
        'content'       => 'tag',
    } ),

Outputs the matching tokens sequences in a TaggedText object as a list of XML tags, with one tag per line. The only (optional) argument is a reference to a hash containing the following optional named parameters:

element

The string that will be used as the name of the XML tag corresponding to a token. Defaults to 'w'.

sequence

The string that will be used as the name of the XML tag corresponding to a sequence. Defaults to 'seq'.

attributes

A reference to a hash where (i) each key is a token attribute to be included in the output as an XML attribute and (ii) each value is the desired name for this XML attribute. As with method as_text(), three token attributes are supported: original (the original word token), tag (the part-of-speech tag), and lemma (the lemma). Inclusion of other token attributes (or attributes not present in the TaggedText because they are not part of the output of the creator TreeTagger object) raises a fatal exception. The value of this parameter defaults to { 'lemma' => 'lemma', 'tag' => 'type' }.

content

A string specifying the token attribute that will be used as the content of the XML element. Defaults to 'original'.

add_element()

adds an element to the sequences. This method is normaly called by the apply method from the class Filter

begin_index

an Int corresponding to the index of the beginning from the matching sequence in the taggedtext sequence (an array, attribute 'sequence' from the taggedtext object)

sequence_length

an Int corresponding to the number of tokens composing the matching sequence

ACCESSORS

get_hits()

Read-only accessor for the 'get_hits' attribute

get_taggedtext()

Read-only accessor for the 'get_taggedtext' attribute

DIAGNOSTICS

Attempt to call as_text with empty 'field' parameter

This exception is raised when method as_text() is called with a reference to an empty list as value for parameter 'field'.

Empty attribute names are not allowed

This exception is raised when method as_XML() is called with a value for parameter 'attributes' such that one or more attributes are associated with an empty string.

Attempt to call as_XML with empty 'element' parameter

This exception is raised when method as_XML() is called with an empty string as value for the 'element' parameter.

Unavailable field(s) (...) requested

This exception is raised when the 'fields' parameter of method as_text() or the 'attributes' or 'content' parameters of method as_XML() specify one or more token attributes that are not available for this TaggedText object (because they were not part of the creator TreeTagger object's output).

DEPENDENCIES

This is part of the Lingua::TreeTagger::Filter distribution. It is not intended to be used as an independent module.

This module requires module Moose and was developed using version 1.09. Please report incompatibilities with earlier versions to the author.

BUGS AND LIMITATIONS

There are no known bugs in this module.

Please report problems to Benjamin Gay (Benjamin.Gay@unil.ch)

Patches are welcome.

AUTHOR

Benjamin Gay, <Benjamin.Gay at unil.ch>

LICENSE AND COPYRIGHT

Copyright 2011 Benjamin Gay.

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.