NAME

Lingua::Interset::Tagset::Multext - Common code for drivers of tagsets of the Multext-EAST project.

VERSION

version 2.043

SYNOPSIS

  package Lingua::Interset::Tagset::HR::Multext;
  extends 'Lingua::Interset::Tagset::Multext';

  # We must redefine the method that returns tagset identification, used by the
  # decode() method for the 'tagset' feature.
  sub get_tagset_id
  {
      # It should correspond to the last two parts in package name, lowercased.
      # Specifically, it should be the ISO 639-2 language code, followed by '::multext'.
      return 'hr::multext';
  }

  # We may add or redefine atoms for individual surface features.
  sub _create_atoms
  {
      my $self = shift;
      # Most atoms can be inherited but some have to be redefined.
      my $atoms = $self->SUPER::_create_atoms();
      $atoms->{verbform} = $self->create_atom (...);
      return $atoms;
  }

  # We must define the lists of surface features for all surface parts of speech!
  sub _create_feature_map
  {
      my $self = shift;
      my %features =
      (
          'N' => ['pos', 'nountype', 'gender', 'number', 'case', 'animateness'],
          ...
      );
      return \%features;
  }

  # We must define the list() method.
  sub list
  {
      my $self = shift;
      my $list = <<end_of_list
  Ncmsn
  Ncmsg
  Ncmsd
  ...
  end_of_list
      ;
      my @list = split(/\r?\n/, $list);
      return \@list;
  }

DESCRIPTION

Common code for drivers of tagsets of the Multext-EAST project. All the Multext-EAST tagsets use the same inventory of parts of speech and the same inventory of features (but not all features are used in all languages). Feature values are individual alphanumeric characters and they are also unified, thus if a feature value appears in several languages, it is always encoded by the same character. The tagsets are positional, i.e. the position of the value character in the tag determines the feature whose value this is. The interpretation of the positions is defined separately for every language and for every part of speech. Empty value (for unknown or irrelevant features) is either encoded by a dash ("-"; if at least one of the following features has a non-empty value) or is just omitted (at the end of the tag).

AUTHOR

Dan Zeman <zeman@ufal.mff.cuni.cz>

COPYRIGHT AND LICENSE

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

To install Lingua::Interset, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Lingua::Interset

CPAN shell

perl -MCPAN -e shell
install Lingua::Interset

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)