The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Lingua::LO::Transform::Syllables - Segment Lao or mixed-script text into syllables.

FUNCTION

This implements a purely regular expression based algorithm to segment Lao text into syllables, based on the one described in PHISSAMAY et al: Syllabification of Lao Script for Line Breaking.

METHODS

new

new( text => $text, ... )

The constructor takes hash-style named arguments. The only one defined so far is text whose value is obviously the text to be segmented.

Note that text is passed through "NFC" in "Unicode::Normalize" first to obtain the Composed Normal Form. In pure Lao text, this affects only the decomposed form of LAO VOWEL SIGN AM that will be transformed from U+0EB2,U+0ECD to U+0EB3.

get_syllables

get_syllables()

Returns a list of Lao syllables found in the text passed to the constructor. If there are any blanks, non-Lao parts etc. mixed in, they will be silently dropped.

get_fragments

get_fragments()

Returns a complete segmentation of the text passed to the constructor as an array of hashes. Each hash has two keys:

text: the text of the respective fragment
is_lao: if true, the fragment is a single valid Lao syllable. If false, it may be whitespace, non-Lao script, Lao characters that don't constitute valid syllables - basically anything at all that's not a valid syllable.