NAME

Lingua::Interset::Converter - Implements a converter between two physical tagsets via Interset.

VERSION

version 3.014

SYNOPSIS

  use Lingua::Interset::Converter;

  my $c = new Lingua::Interset::Converter ('from' => 'cs::multext', 'to' => 'cs::pdt');
  while (<CONLL_IN>)
  {
      chomp ();
      my @fields = split (/\t/, $_);
      my $source_tag = $fields[4];
      $fields[4] = $c->convert ($source_tag);
      print (join("\t", @fields), "\n");
  }

DESCRIPTION

Converter is a simple class that implements Interset-based conversion of tags between two physical tagsets. It includes caching, which will improve performance when converting tags in a large corpus.

ATTRIBUTES

from

Identifier of the source tagset (composed of language code and tagset id, all lowercase, for example cs::multext). It must be provided upon construction.

from

Identifier of the target tagset (composed of language code and tagset id, all lowercase, for example cs::pdt). It must be provided upon construction.

METHODS

convert()

  my $tag1  = convert ($tag0);

Converts tag from the source tagset to the target tagset via Interset. Tags once converted are cached so the (potentially costly) Interset decoding-encoding methods are called only once per source tag.

SEE ALSO

Lingua::Interset

AUTHOR

Dan Zeman <zeman@ufal.mff.cuni.cz>

COPYRIGHT AND LICENSE

This software is copyright (c) 2019 by Univerzita Karlova (Charles University).

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.