Lingua::ZH::TaBE - Chinese processing via libtabe
This document describes version 0.07 of Lingua::ZH::TaBE, released December 31, 2005.
use Lingua::ZH::TaBE; my $tabe = Lingua::ZH::TaBE->new; # Phrase splitter my @phrases = $tabe->split( "當我們在電腦中處理中文資訊時,相信其中最惱人的". "狀況之一,莫過於想打的字打不出來了。" ); # Chaining various components print $tabe->Chu("道可道,非常道。") # sentence ->chunks->[2] # 非常道 # chunk ->tsis->[0] # 非常 # phrase ->zhis->[1] # 常 # character ->yins->[0] # ㄔㄤˊ # pronounciation ->zuyins->[0], # ㄔ # phonetic symbols
This module is a Perl interface to the TaBE (Taiwan and Big5 Encoding) library, an unified interface and library dealing with Chinese words, phrases, sentences, and phonetic symbols; it is intended to be used as the foundation of Chinese text processing.
Lingua::ZH::TaBE provides an object-oriented interface (preferred), as well as a procedural interface consisting of all C functions in tabe.h.
tabe.h
Creates a LibTaBE handle and opens databases. If unspecified, find in the usual libtabe data directory automatically.
Split the text in $string; returns a list of strings representing the words obtained. You may specify Complex or Backward as $method to use an alternate segmentation algorithm.
$string
Complex
Backward
$method
Constructors for various level of objects, each taking one argument for initialization.
All functions below belong to the Lingua::ZH::TaBE class; they are not exported by default, but may be imported explicitly, or implicitly via use Lingua::ZH::TaBE ':all'.
use Lingua::ZH::TaBE ':all'
$TsiDB = TsiDBOpen($type, $db_name, $flags); $num = TsiInfoLookupPossibleTsiYin($TsiDB, $Tsi); $TsiYinDB = TsiYinDBOpen($type, $db_name, $flags); $num = ChuInfoToChunkInfo($Chu); $num = ChunkSegmentationSimplex($TsiDB, $Chunk); $num = ChunkSegmentationComplex($TsiDB, $Chunk); $num = ChunkSegmentationBackward($TsiDB, $Chunk); $num = TsiInfoLookupZhiYin($TsiDB, $Tsi); $string = YinLookupZhiList($Yin); $string = YinToZuYinSymbolSequence($Yin); $yin = ZuYinSymbolSequenceToYin($string); $zhi = ZuYinIndexToZuYinSymbol($ZuYin); $zuyin = ZuYinSymbolToZuYinIndex($Zhi); $zuyin = ZozyKeyToZuYinIndex($key); $num = ZhiIsBig5Code($Zhi); $zhicode = ZhiToZhiCode($Zhi); $zhi = ZhiCodeToZhi($zhicode); $num = ZhiCodeToPackedBig5Code($zhicode); $num = ZhiCodeLookupRefCount($zhicode);
All constants below belong to the Lingua::ZH::TaBE class; they are not exported by default, but may be imported explicitly, or implicitly via use Lingua::ZH::TaBE ':all'.
DB_TYPE_DB 0 DB_TYPE_LAST 1 DB_FLAG_OVERWRITE 0x01 DB_FLAG_CREATEDB 0x02 DB_FLAG_READONLY 0x04 DB_FLAG_NOSYNC 0x08 DB_FLAG_SHARED 0x10 DB_FLAG_NOUNPACK_YIN 0x20
The TsiYin family of functions are yet incomplete.
ftp://xcin.linux.org.tw/pub/xcin/libtabe/devel/
http://libtabe.sourceforge.net/
Audrey Tang <autrijus@autrijus.org>
Copyright 2003, 2004, 2005 by Audrey Tang <autrijus@autrijus.org>.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
See http://www.perl.com/perl/misc/Artistic.html
To install Lingua::ZH::TaBE, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Lingua::ZH::TaBE
CPAN shell
perl -MCPAN -e shell install Lingua::ZH::TaBE
For more information on module installation, please visit the detailed CPAN module installation guide.