NAME

Text::WagnerFischer::Armenian - a variation on Text::WagnerFischer for Armenian-language strings

SYNOPSIS


            
              
              use Text::WagnerFischer::Armenian qw( distance );
use utf8;  # for the Armenian characters in the source code
print distance("ձեռն", "ձեռան") . "\n";  
   # "dzerrn -> dzerran"; prints 1
print distance("ձեռն", "ձերն") . "\n";  
   # "dzerrn -> dzern"; prints 0.5
print distance("կինք", "կին") . "\n";
   # "kin" -> "kink'"; prints 0.5
my @words = qw( զօրսն Զորս զզօրսն );
my @distances = distance( "զօրս", @words );
print "@distances\n";
   # "zors" -> "zorsn, Zors, zzorsn" 
   # prints "0.5 0.25 1"
# Change the cost of a letter case mismatch to 1
my $edit_values = [ 0, 1, 1, 1, 0.5, 0.5, 0.5 ],  
print distance( $edit_values, "ձեռն", "Ձեռն" ) . "\n";
   # "dzerrn" -> "DZerrn"; prints 1

DESCRIPTION

This module implements the Wagner-Fischer distance algorithm modified for Armenian strings. The Armenian language has a number of single-letter prefixes and suffixes which, while not changing the basic meaning of the word, function as definite articles, prepositions, or grammatical markers. These changes, and letter substitutions that represent vocalic equivalence, should be counted as a smaller edit distance than a change that is a normal character substitution.

The Armenian weight function recognizes four extra edit types:


            
              
              / a: x = y           (cost for letter match)
| b: x = - or y = -  (cost for letter insertion/deletion)
w( x, y ) = | c: x != y          (cost for letter mismatch) 
| d: x = X           (cost for case mismatch)
| e: x ~ y           (cost for letter vocalic equivalence)
| f: x = (z|y|ts) && y = - (or vice versa)
|          (cost for grammatic prefix)
| g: x = (n|k'|s|d) && y = - (or vice versa)
\          (cost for grammatic suffix)

SUBROUTINES

distance( \@editweight, $string1, $string2, [ .. $stringN ] );
distance( $string1, $string2, [ .. $stringN ] );: The main exported function of this module. Takes a list of two or more strings and returns the edit distance between the first string and each of the others. The "edit_distances" array is an optional first argument, with which users may override the default edit penalties, as described above.
am_lc( $char ): A small utility function, useful for Armenian text. Returns the lowercase version of the character passed in.

LIMITATIONS

There are many cases of Armenian word equivalence that are not perfectly handled by this; it is meant to be a rough heuristic for comparing transcriptions of handwriting. In particular, multi-letter suffixes, and some orthographic equivalence e.g "o" -> "aw", are not handled at all.

LICENSE

This package is free software and is provided "as is" without express or implied warranty. You can redistribute it and/or modify it under the same terms as Perl itself.

AUTHOR

Tara L Andrews, aurum@cpan.org

To install Text::WagnerFischer::Armenian, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Text::WagnerFischer::Armenian

CPAN shell

perl -MCPAN -e shell
install Text::WagnerFischer::Armenian

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)