SADAHIRO Tomoyuki

NAME

Lingua::KO::Hangul::Util - utility functions for Hangul in Unicode

SYNOPSIS

  use Lingua::KO::Hangul::Util qw(:all);

  decomposeSyllable("\x{AC00}");          # "\x{1100}\x{1161}"
  composeSyllable("\x{1100}\x{1161}");    # "\x{AC00}"
  decomposeJamo("\x{1101}");              # "\x{1100}\x{1100}"
  composeJamo("\x{1100}\x{1100}");        # "\x{1101}"

  getHangulName(0xAC00);                  # "HANGUL SYLLABLE GA"
  parseHangulName("HANGUL SYLLABLE GA");  # 0xAC00

DESCRIPTION

A Hangul syllable consists of Hangul jamo (Hangul letters).

Hangul letters are classified into three classes:

  CHOSEONG  (the initial sound) as a leading consonant (L),
  JUNGSEONG (the medial sound)  as a vowel (V),
  JONGSEONG (the final sound)   as a trailing consonant (T).

Any Hangul syllable is a composition of (i) L + V, or (ii) L + V + T.

Composition and Decomposition

$resultant_string = decomposeSyllable($string)

It decomposes a precomposed syllable (LV or LVT) to a sequence of conjoining jamo (L + V or L + V + T) and returns the result as a string.

Any characters other than Hangul syllables are not affected.

$resultant_string = composeSyllable($string)

It composes a sequence of conjoining jamo (L + V or L + V + T) to a precomposed syllable (LV or LVT) if possible, and returns the result as a string. A syllable LV and final jamo T are also composed.

Any characters other than Hangul jamo and syllables are not affected.

$resultant_string = decomposeJamo($string)

It decomposes a complex jamo to a sequence of simple jamo if possible, and returns the result as a string. Any characters other than complex jamo are not affected.

  e.g.
      CHOSEONG SIOS-PIEUP to CHOSEONG SIOS + PIEUP
      JUNGSEONG AE        to JUNGSEONG A + I
      JUNGSEONG WE        to JUNGSEONG U + EO + I
      JONGSEONG SSANGSIOS to JONGSEONG SIOS + SIOS
$resultant_string = composeJamo($string)

It composes a sequence of simple jamo (L1 + L2, V1 + V2 + V3, etc.) to a complex jamo if possible, and returns the result as a string. Any characters other than simple jamo are not affected.

  e.g.
      CHOSEONG SIOS + PIEUP to CHOSEONG SIOS-PIEUP
      JUNGSEONG A + I       to JUNGSEONG AE
      JUNGSEONG U + EO + I  to JUNGSEONG WE
      JONGSEONG SIOS + SIOS to JONGSEONG SSANGSIOS
$resultant_string = decomposeFull($string)

It decomposes a syllable/complex jamo to a sequence of simple jamo. Equivalent to decomposeJamo(decomposeSyllable($string)).

Composition and Decomposition (Old-interface, deprecated!)

$string_decomposed = decomposeHangul($code_point)
@codepoints = decomposeHangul($code_point)

If the specified code point is of a Hangul syllable, it returns a list of code points (in a list context) or a string (in a scalar context) of its decomposition.

   decomposeHangul(0xAC00) # U+AC00 is HANGUL SYLLABLE GA.
      returns "\x{1100}\x{1161}" or (0x1100, 0x1161);

   decomposeHangul(0xAE00) # U+AE00 is HANGUL SYLLABLE GEUL.
      returns "\x{1100}\x{1173}\x{11AF}" or (0x1100, 0x1173, 0x11AF);

Otherwise, returns false (empty string or empty list).

   decomposeHangul(0x0041) # outside Hangul syllables
      returns empty string or empty list.
$string_composed = composeHangul($src_string)
@code_points_composed = composeHangul($src_string)

Any sequence of an initial jamo L and a medial jamo V is composed to a syllable LV; then any sequence of a syllable LV and a final jamo T is composed to a syllable LVT.

Any characters other than Hangul jamo and syllables are not affected.

   composeHangul("\x{1100}\x{1173}\x{11AF}.")
   # returns "\x{AE00}." or (0xAE00,0x2E);
$code_point_composite = getHangulComposite($code_point_here, $code_point_next)

It returns the codepoint of the composite if both two code points, $code_point_here and $code_point_next, are in Hangul, and composable.

Otherwise, returns undef.

Hangul Syllable Name

The following functions handle only a precomposed Hangul syllable (from U+AC00 to U+D7A3), but not a Hangul jamo or other Hangul-related character.

Names of Hangul syllables have a format of "HANGUL SYLLABLE %s".

$name = getHangulName($code_point)

If the specified code point is of a Hangul syllable, it returns its name; otherwise it returns undef.

   getHangulName(0xAC00) returns "HANGUL SYLLABLE GA";
   getHangulName(0x0041) returns undef.
$codepoint = parseHangulName($name)

If the specified name is of a Hangul syllable, it returns its code point; otherwise it returns undef.

   parseHangulName("HANGUL SYLLABLE GEUL") returns 0xAE00;

   parseHangulName("LATIN SMALL LETTER A") returns undef;

   parseHangulName("HANGUL SYLLABLE PERL") returns undef;
    # Regrettably, HANGUL SYLLABLE PERL does not exist :-)

Standard Korean Syllable Block

Standard Korean syllable block consists of L+ V+ T* (a sequence of one or more L, one or more V, and zero or more T) according to conjoining jamo behabior revised in Unicode 3.2 (cf. UAX #28). A sequence of L followed by T is not a syllable block without V, but consists of two nonstandard syllable blocks: one without V, and another without L and V.

$bool = isStandardForm($string)

It returns boolean whether the string is encoded in the standard form without a nonstandard sequence. It returns true only if the string contains no nonstandard sequence.

$resultant_string = insertFiller($string)

It transforms the string into standard form by inserting fillers into each syllables and returns the result as a string. Choseong filler (Lf, U+115F) is inserted into a syllable block without L. Jungseong filler (Vf, U+1160) is inserted into a syllable block without V.

$type = getSyllableType($code_point)

It returns the Hangul syllable type (cf. HangulSyllableType.txt) for the specified code point as a string: "L" for leading jamo, "V" for vowel jamo, "T" for trailing jamo, "LV" for LV syllables, "LVT" for LVT syllables, and "NA" for other code points (as Not Applicable).

EXPORT

By default:

    decomposeHangul
    composeHangul
    getHangulName
    parseHangulName
    getHangulComposite

On request:

    decomposeSyllable
    composeSyllable
    decomposeJamo
    composeJamo
    decomposeFull
    isStandardForm
    insertFiller
    getSyllableType

CAVEAT

This module does not support Hangul jamo assigned in Unicode 5.2.0 (2009).

A list of Hangul charcters this module supports:

    1100..1159 ; 1.1 # [90] HANGUL CHOSEONG KIYEOK..HANGUL CHOSEONG YEORINHIEUH
    115F..11A2 ; 1.1 # [68] HANGUL CHOSEONG FILLER..HANGUL JUNGSEONG SSANGARAEA
    11A8..11F9 ; 1.1 # [82] HANGUL JONGSEONG KIYEOK..HANGUL JONGSEONG YEORINHIEUH
    AC00..D7A3 ; 2.0 # [11172] HANGUL SYLLABLE GA..HANGUL SYLLABLE HIH

AUTHOR

SADAHIRO Tomoyuki <SADAHIRO@cpan.org>

Copyright(C) 2001, 2003, 2005, SADAHIRO Tomoyuki. Japan. All rights reserved.

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO

Unicode Normalization Forms (UAX #15)

http://www.unicode.org/reports/tr15/

Conjoining Jamo Behavior (revision) in UAX #28

http://www.unicode.org/reports/tr28/#3_11_conjoining_jamo_behavior

Hangul Syllable Type

http://www.unicode.org/Public/UNIDATA/HangulSyllableType.txt

Jamo Decomposition in Old Unicode

http://www.unicode.org/Public/2.1-Update3/UnicodeData-2.1.8.txt

ISO/IEC JTC1/SC22/WG20 N954

Paper by K. KIM: New canonical decomposition and composition processes for Hangeul

http://std.dkuug.dk/JTC1/SC22/WG20/docs/N954.PDF

(summary: http://std.dkuug.dk/JTC1/SC22/WG20/docs/N953.PDF) (cf. http://std.dkuug.dk/JTC1/SC22/WG20/docs/documents.html)




Hosting generously
sponsored by Bytemark