The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Lingua::ZH::TaBE - Chinese processing via libtabe

VERSION

This document describes version 0.07 of Lingua::ZH::TaBE, released December 31, 2005.

SYNOPSIS

    use Lingua::ZH::TaBE;

    my $tabe = Lingua::ZH::TaBE->new;

    # Phrase splitter
    my @phrases = $tabe->split(
        "當我們在電腦中處理中文資訊時,相信其中最惱人的".
        "狀況之一,莫過於想打的字打不出來了。"
    );

    # Chaining various components
    print $tabe->Chu("道可道,非常道。")    # sentence
        ->chunks->[2]       # 非常道           # chunk
        ->tsis->[0]         # 非常            # phrase
        ->zhis->[1]         # 常     # character
        ->yins->[0]         # ㄔㄤˊ           # pronounciation
        ->zuyins->[0],      # ㄔ     # phonetic symbols

DESCRIPTION

This module is a Perl interface to the TaBE (Taiwan and Big5 Encoding) library, an unified interface and library dealing with Chinese words, phrases, sentences, and phonetic symbols; it is intended to be used as the foundation of Chinese text processing.

Lingua::ZH::TaBE provides an object-oriented interface (preferred), as well as a procedural interface consisting of all C functions in tabe.h.

Object-Oriented Interface

Lingua::ZH::TaBE

new( [tsi_db => $file, tsiyin_db => $file] )

Creates a LibTaBE handle and opens databases. If unspecified, find in the usual libtabe data directory automatically.

split( $string [, $method] )

Split the text in $string; returns a list of strings representing the words obtained. You may specify Complex or Backward as $method to use an alternate segmentation algorithm.

Chu(), Chunk(), Tsi(), Zhi(), Yin(), ZuYin()

Constructors for various level of objects, each taking one argument for initialization.

Lingua::ZH::TaBE::Chu

chunks()

Lingua::ZH::TaBE::Chunk

tsis([$method])

Lingua::ZH::TaBE::Tsi

zhis()
yins()

Lingua::ZH::TaBE::Zhi

yins()
ToZhi()
ToZhiCode()
IsBig5Code()
ToPackedBig5Code()
LookupRefCount()

Lingua::ZH::TaBE::Yin

zuyins()
zhis()
ToYin()
ToZuYinSymbolSequence()

Lingua::ZH::TaBE::ZuYin

yin()
zhi()

Procedural Interface

All functions below belong to the Lingua::ZH::TaBE class; they are not exported by default, but may be imported explicitly, or implicitly via use Lingua::ZH::TaBE ':all'.

    $TsiDB      = TsiDBOpen($type, $db_name, $flags);
    $num        = TsiInfoLookupPossibleTsiYin($TsiDB, $Tsi);
    $TsiYinDB   = TsiYinDBOpen($type, $db_name, $flags);
    $num        = ChuInfoToChunkInfo($Chu);
    $num        = ChunkSegmentationSimplex($TsiDB, $Chunk);
    $num        = ChunkSegmentationComplex($TsiDB, $Chunk);
    $num        = ChunkSegmentationBackward($TsiDB, $Chunk);
    $num        = TsiInfoLookupZhiYin($TsiDB, $Tsi);
    $string     = YinLookupZhiList($Yin);
    $string     = YinToZuYinSymbolSequence($Yin);
    $yin        = ZuYinSymbolSequenceToYin($string);
    $zhi        = ZuYinIndexToZuYinSymbol($ZuYin);
    $zuyin      = ZuYinSymbolToZuYinIndex($Zhi);
    $zuyin      = ZozyKeyToZuYinIndex($key);
    $num        = ZhiIsBig5Code($Zhi);
    $zhicode    = ZhiToZhiCode($Zhi);
    $zhi        = ZhiCodeToZhi($zhicode);
    $num        = ZhiCodeToPackedBig5Code($zhicode);
    $num        = ZhiCodeLookupRefCount($zhicode);

Constants

All constants below belong to the Lingua::ZH::TaBE class; they are not exported by default, but may be imported explicitly, or implicitly via use Lingua::ZH::TaBE ':all'.

    DB_TYPE_DB                  0
    DB_TYPE_LAST                1
    DB_FLAG_OVERWRITE           0x01
    DB_FLAG_CREATEDB            0x02
    DB_FLAG_READONLY            0x04
    DB_FLAG_NOSYNC              0x08
    DB_FLAG_SHARED              0x10
    DB_FLAG_NOUNPACK_YIN        0x20

CAVEATS

The TsiYin family of functions are yet incomplete.

SEE ALSO

ftp://xcin.linux.org.tw/pub/xcin/libtabe/devel/

http://libtabe.sourceforge.net/

AUTHORS

Audrey Tang <autrijus@autrijus.org>

COPYRIGHT

Copyright 2003, 2004, 2005 by Audrey Tang <autrijus@autrijus.org>.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

See http://www.perl.com/perl/misc/Artistic.html