NAME

Lingua::FreeLing3::Tokenizer - Interface to FreeLing3 Tokenizer

SYNOPSIS

   use Lingua::FreeLing3::Tokenizer;

   my $pt_tok = Lingua::FreeLing3::Tokenizer->new("pt");

   # compute list of Lingua::FreeLing3::Word
   my $list_of_words = $pt_tok->tokenize("texto e mais texto.");

   # compute list of strings (words)
   my $list_of_words = $pt_tok->tokenize("texto e mais texto.",
                                         to_text => 1);

DESCRIPTION

Interface to the FreeLing3 tokenizer library.

new

Object constructor. One argument is required: the languge code (Lingua::FreeLing3 will search for the tokenization data file).

Returns the tokenizer object for that language, or undef in case of failure.

tokenize

This is the only available method for the tokenizer object. It receives a string and tokenizes the text, returning a reference to a list of words.

Without any further configuration option, it will return a reference to a list of Lingua::FreeLing3::Word. The option to_text can be set, and it will return a reference to a list of strings.

SEE ALSO

Lingua::FreeLing3(3) for the documentation table of contents. The freeling library for extra information, or perl(1) itself.

AUTHOR

Alberto Manuel Brandão Simões, <ambs@cpan.org>

Jorge Cunha Mendes <jorgecunhamendes@gmail.com>

COPYRIGHT AND LICENSE

Copyright (C) 2011 by Projecto Natura