NAME

Convert::Translit, transliterate, build_substitutes - Perl module for string conversion among numerous character sets

SYNOPSIS

use Convert::Translit;

  $translator = new Convert::Translit($result_chset);
  $translator = new Convert::Translit($orig_chset, $result_chset);
  $translator = new Convert::Translit($orig_chset, $result_chset, $verbose);

  $result_st = $translator->transliterate($orig_st);
  $result_st = Convert::Translit::transliterate($orig_st);

  build_substitutes Convert::Translit();

  Convert::Translit::build_substitutes();

DESCRIPTION

This module converts strings among 8-bit character sets defined by IETF RFC 1345 (about 128 sets). The RFC document is included so you can look up character set names and aliases; it's also read by the module when composing conversion maps. Failing functions or objects return undef value.

Export_OK Functions:

transliterate(): returns a string in $result_chset for an argument string in $orig_chset, transliterating by a map composed by new().
build_substitutes(): rebuilds the file "substitutes" containing character definitions and approximate substitutions used when a character in $orig_chset isn't defined in $result_chset. For example, "Latin capital A" may be substituted for "Latin capital A with ogonek". It takes a long time to rebuild this file, but you should never need to. Its only source of information is file "rfc1345".

Object methods:

new(): creates a new object for converting from $orig_chset to $result_chset, these being names (or aliases) of 8-bit character sets defined in RFC 1345. If only one argument, then $orig_chset is assumed "ascii". If three arguments, the third is verbosity flag. Verbose output lists approximate substitutions and other compromises.
transliterate(): is same as the function of that name.
build_substitutes(): is same as the function of that name.

FILES

 Convert/Translit/rfc1345  (IETF RFC 1345, June 1992)
 Convert/Translit/substitutes

METHODOLGY

Only one-to-one character mapping is done, so characters with diacritics (like A-ogonek) are never converted to (letter character, diacritic character) pairs, rather are subject to simplification. If no approximate substitute is available, then a unrelated substitute is chosen, preferably with the same code value. Undefined $orig_chset characters are translated to a chosen indicator character. Transliteration is not guaranteed commutative when substitutions were required. An $orig_chset defined as 7-bit is assumed to be repeated to make an 8-bit set (in the style of "extended ascii"); no such adjustment is made for $result_chset. The few mistakes in the RFC document are corrected in the module.

EXAMPLES

  Convert Russian language text from IBM to ASCII encoding:
  $xxx = new Convert::Translit("EBCDIC-Cyrillic", "Cyrillic");
  $ascii_cyr_st = $xxx->transliterate($ibm_cyr_st);

  Convert from plain ASCII (default $orig_chset) to Latin2 (Central European):
  $yyy = new Convert::Translit("Latin2");
  $cnt_eur_st = $yyy->transliterate($ascii_st);

  Since plain ASCII is subset of Latin2, nothing is lost in transliteration.
  But going the other direction requires numerous simplifications:
  $zzz = new Convert::Translit("Latin2", "ascii");
  $ascii_st = $zzz->transliterate($cnt_eur_st);

  Back to ASCII again, although substitutions probably mean ($again ne $cnt_eur_st):
  $again = $yyy->transliterate($ascii_st);

  The example.pl script converts a Polish language phrase from Latin2 to EBCDIC-US.

PORTABILITY

Requires Perl version 5. Developed with MacPerl on Macintosh 68040 OS 7.6.1. Tested on Sun Unix 4.1.3.

AUTHOR

Genji Schmeder <genji@community.net>

  Enjoy in good health.
  Cieszcie sie dobrym zdrowiem.
  Que gozen con salud.
  Benutze es heilsam gern!
  Genki dewa, yorokobi nasai.

COPYRIGHT

ACKNOWLEDGEMENTS

  Chris Leach, author of EBCDIC.pm
  Keld Simonsen, author of RFC 1345

To install Convert::Translit, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Convert::Translit

CPAN shell

perl -MCPAN -e shell
install Convert::Translit

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)