The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Lingua::ZH::TaBE - Chinese processing via libtabe

VERSION

This document describes version 0.02 of Lingua::ZH::TaBE, released January 19, 2003.

SYNOPSIS

    use Lingua::ZH::TaBE;

    my $tabe = Lingua::ZH::TaBE->new;

    # Phrase splitter
    my @phrases = $tabe->split(
        "·í§Ú­Ì¦b¹q¸£¤¤³B²z¤¤¤å¸ê°T®É¡A¬Û«H¨ä¤¤³Ì´o¤Hªº".
        "ª¬ªp¤§¤@¡A²ö¹L©ó·Q¥´ªº¦r¥´¤£¥X¨Ó¤F¡C"
    );

    # Chaining various components
    print $tabe->Chu("¹D¥i¹D¡A«D±`¹D¡C")    # sentence
        ->chunks->[2]       # «D±`¹D        # chunk
        ->tsis->[0]         # «D±`          # phrase
        ->zhis->[1]         # ±`            # character
        ->yins->[0]         # £¥£µ£½        # pronounciation
        ->zuyins->[0],      # £¥            # phonetic symbols

DESCRIPTION

This module is a Perl interface to the TaBE (Taiwan and Big5 Encoding) library, an unified interface and library dealing with Chinese words, phrases, sentences, and phonetic symbols; it is intended to be used as the foundation of Chinese text processing.

Lingua::ZH::TaBE provides an object-oriented interface (preferred), as well as a procedural interface consisting of all C functions in tabe.h.

Object-Oriented Interface

Lingua::ZH::TaBE

new( [tsi_db => $file, tsiyin_db => $file] )

Creates a LibTaBE handle and opens databases. If unspecified, find in the usual libtabe data directory automatically.

split( $string [, $method] )

Split the text in $string; returns a list of strings representing the words obtained. You may specify Complex or Backward as $method to use an alternate segmentation algorithm.

Chu(), Chunk(), Tsi(), Zhi(), Yin(), ZuYin()

Constructors for various level of objects, each taking one argument for initialization.

Lingua::ZH::TaBE::Chu

chunks()

Lingua::ZH::TaBE::Chunk

tsis([$method])

Lingua::ZH::TaBE::Tsi

zhis()

yins()

Lingua::ZH::TaBE::Zhi

yins()

ToZhi()

ToZhiCode()

IsBig5Code()

ToPackedBig5Code()

LookupRefCount()

Lingua::ZH::TaBE::Yin

zuyins()

zhis()

ToYin()

ToZuYinSymbolSequence()

Lingua::ZH::TaBE::ZuYin

yin()

zhi()

Procedural Interface

All functions below belong to the Lingua::ZH::TaBE class; they are not exported by default, but may be imported explicitly, or implicitly via use Lingua::ZH::TaBE ':all'.

    $TsiDB      = TsiDBOpen($type, $db_name, $flags);
    $num        = TsiInfoLookupPossibleTsiYin($TsiDB, $Tsi);
    $TsiYinDB   = TsiYinDBOpen($type, $db_name, $flags);
    $num        = ChuInfoToChunkInfo($Chu);
    $num        = ChunkSegmentationSimplex($TsiDB, $Chunk);
    $num        = ChunkSegmentationComplex($TsiDB, $Chunk);
    $num        = ChunkSegmentationBackward($TsiDB, $Chunk);
    $num        = TsiInfoLookupZhiYin($TsiDB, $Tsi);
    $string     = YinLookupZhiList($Yin);
    $string     = YinToZuYinSymbolSequence($Yin);
    $yin        = ZuYinSymbolSequenceToYin($string);
    $zhi        = ZuYinIndexToZuYinSymbol($ZuYin);
    $zuyin      = ZuYinSymbolToZuYinIndex($Zhi);
    $zuyin      = ZozyKeyToZuYinIndex($key);
    $num        = ZhiIsBig5Code($Zhi);
    $zhicode    = ZhiToZhiCode($Zhi);
    $zhi        = ZhiCodeToZhi($zhicode);
    $num        = ZhiCodeToPackedBig5Code($zhicode);
    $num        = ZhiCodeLookupRefCount($zhicode);

Constants

All constants below belong to the Lingua::ZH::TaBE class; they are not exported by default, but may be imported explicitly, or implicitly via use Lingua::ZH::TaBE ':all'.

    DB_TYPE_DB                  0
    DB_TYPE_LAST                1
    DB_FLAG_OVERWRITE           0x01
    DB_FLAG_CREATEDB            0x02
    DB_FLAG_READONLY            0x04
    DB_FLAG_NOSYNC              0x08
    DB_FLAG_SHARED              0x10
    DB_FLAG_NOUNPACK_YIN        0x20

CAVEATS

The TsiYin family of functions are yet incomplete.

SEE ALSO

ftp://xcin.linux.org.tw/pub/xcin/libtabe/devel/

http://libtabe.sourceforge.net/

AUTHORS

Autrijus Tang <autrijus@autrijus.org>

COPYRIGHT

Copyright 2003 by Autrijus Tang <autrijus@autrijus.org>.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

See http://www.perl.com/perl/misc/Artistic.html

1 POD Error

The following errors were encountered while parsing the POD:

Around line 356:

Non-ASCII character seen before =encoding in '"·í§Ú­Ì¦b¹q¸£¤¤³B²z¤¤¤å¸ê°T®É¡A¬Û«H¨ä¤¤³Ì´o¤Hªº".'. Assuming CP1252