NAME

Lingua::Interset::Tagset::BN::Conll - Driver for the Bengali tagset of the ICON 2009 and 2010 Shared Tasks, as used in the CoNLL data format.

VERSION

version 3.014

SYNOPSIS

  use Lingua::Interset::Tagset::BN::Conll;
  my $driver = Lingua::Interset::Tagset::BN::Conll->new();
  my $fs = $driver->decode("NN\tcat-n|gend-|num-sg|pers-|case-d|vib-0|tam-0");

or

  use Lingua::Interset qw(decode);
  my $fs = decode('bn::conll', "NN\tcat-n|gend-|num-sg|pers-|case-d|vib-0|tam-0");

DESCRIPTION

Interset driver for the Bengali tagset of the ICON 2009 and 2010 Shared Tasks, as used in the CoNLL data format. CoNLL tagsets in Interset are traditionally three values separated by tabs, coming from the CoNLL columns CPOS, POS and FEAT. ICON shared task data were converted to CoNLL from the native Shakti Standard Format (SSF). The CoNLL CPOS column contains so-called chunk tag, which we do not want to decode, thus we expect only two tab-separated values in this tagset: the POS column (which contains the part of speech of the headword of the chunk) and partial contents of the FEAT column (we exclude features that should not be considered part of the tag, e.g. the lex feature, which contains lemma or word stem).

Short description of the part of speech tags can be found in http://ltrc.iiit.ac.in/nlptools2010/documentation.php. More information is available in the annotators' manual at http://ltrc.iiit.ac.in/MachineTrans/publications/technicalReports/tr031/posguidelines.pdf.

SEE ALSO

Lingua::Interset, Lingua::Interset::Tagset, Lingua::Interset::FeatureStructure

AUTHOR

Dan Zeman <zeman@ufal.mff.cuni.cz>

COPYRIGHT AND LICENSE

This software is copyright (c) 2019 by Univerzita Karlova (Charles University).

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.