The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

NNexus::Morphology - Basic morphological and canonicalization routines

SYNOPSIS

  use NNexus::Morphology qw(:all);

  # Possessives
  $boolean = is_possessive($phrase);
  $nonpossesive_phrase = get_nonpossessive($phrase);
  $possessive_phrase = get_possessive($word);

  # Plurals
  $boolean = is_plural($word);
  $plural_phrase = pluralize($phrase);
  $singular_phrase = depluralize_phrase($phrase);
  $singular_word = depluralize_word($word);
  
  # Determiners
  $noun = undetermine_word($noun_phrase);

  # Roots
  $root = root($word);

  # Phrase manipulation
  ($firstword,$tailphrase) = firstword_split($phrase);
  
  # Web and NNexus Resources
  $canonical_url = canonicalize_url($raw_url);
  $boolean = admissible_name($word);
  $normalized_word = normalize_word($word);

DESCRIPTION

The NNexus::Morphology module provides basic support for morphological operations on English words and phrases. While it does not at all claim good linguistic accuracy and recall, it serves the intended purpose of normalizing candidate concepts in NNexus to a standard infinitive-like form, free of basic inflections.

In addition the module contains normalization routines for web resources, as well as admissibility checks for words it considers grammatical.

METHODS

$boolean = is_possessive($phrase);

Returns true if the given phrase is possessive, false otherwise.

$nonpossesive_phrase = get_nonpossessive($phrase);

Removes possessives from a phrase, if any. Only inspects the leading word.

$possessive_phrase = get_possessive($phrase);

Adds a possessive suffix to the leading word of a given phrase, or single word.

<$boolean = is_plural($word); >

True if word on input is plural, false otherwise.

$plural_phrase = pluralize($phrase);

Returns the plural of a (noun) phrase, e.g. "law of identity" would become "laws of identity"

$singular_phrase = depluralize_phrase($phrase);

Returns the singular of a (noun) phrase, e.g. "laws of identity" would become "law of identity"

$singular_word = depluralize_word($word);

Returns the singular of a word

$undetermined_noun_phrase = undetermine_word($noun_phrase);

Removes determiners from a noun phrase.

$root = root($word);

Heuristic stemming algorithm, returns the root of a given word.

($firstword,$tailphrase) = firstword_split($phrase);

Given a phrase, splits out the first word and returns it together with the remaining tail of the phrase.

$canonical_url = canonicalize_url($raw_url);

Transforms a URL to a minimized canonical representation, suitable for storage into the NNexus Database.

$boolean = admissible_name($word);

Returns true if the word is admissible and false otherwise. Currently checks for leftover bad markup, such as LaTeX math mode and macros.

$normalized_word = normalize_word($word);

High-level API to normalize a word down to a canonical representation, which could be then matched against the NNexus database.

Performs: unicode-to-ascii dumbing down, lower casing, removal of determiners, possessives and plurals.

AUTHOR

Deyan Ginev <d.ginev@jacobs-university.de>

COPYRIGHT

 Research software, produced as part of work done by 
 the KWARC group at Jacobs University Bremen.
 Released under the MIT License (MIT)