NAME
Lingua::ZH::TaBE - Chinese processing via libtabe
SYNOPSIS
use Lingua::ZH::TaBE;
my $tabe = Lingua::ZH::TaBE->new(
tsi_db => '/usr/local/share/tabe/tsiyin/tsi.db'
);
# Phrase splitter
my @words = $tabe->split(
"·í§Ú̦b¹q¸£¤¤³B²z¤¤¤å¸ê°T®É¡A¬Û«H¨ä¤¤³Ì´o¤Hªº".
"ª¬ªp¤§¤@¡A²ö¹L©ó·Q¥´ªº¦r¥´¤£¥X¨Ó¤F¡C"
);
# Chaining various components
print $tabe->Chu("¹D¥i¹D¡A«D±`¹D¡C") # sentence
->chunks->[2] # «D±`¹D # chunk
->tsis->[0] # «D±` # phrase
->zhis->[1] # ±` # character
->yins->[0] # £¥£µ£½ # pronounciation
->zuyins->[0], # £¥ # phonetic symbols
DESCRIPTION
This module is a Perl interface to the TaBE (Taiwan and Big5 Encoding) library, an unified interface and library dealing with Chinese words, phrases, sentences, and phonetic symbols; it is intended to be used as the foundation of Chinese text processing.
Lingua::ZH::TaBE provides an object-oriented interface (preferred), as well as a procedural interface consisting of all C functions in tabe.h
.
Object-Oriented Interface
Lingua::ZH::TaBE
new( tsi_db => $file )
Creates a LibTaBE handle and opens databases.
split( $string [, $method] )
Split the text in $string
; returns a list of strings representing the words obtained. You may specify Complex
or Backward
as $method
to use an alternate segmentation algorithm.
Chu(), Chunk(), Tsi(), Zhi(), Yin(), ZuYin()
Constructors for various level of objects, each taking one argument for initialization.
Lingua::ZH::TaBE::Chu
chunks()
Lingua::ZH::TaBE::Chunk
tsis([$method])
Lingua::ZH::TaBE::Tsi
zhis()
yins()
Lingua::ZH::TaBE::Zhi
yins()
ToZhi()
ToZhiCode()
IsBig5Code()
ToPackedBig5Code()
LookupRefCount()
Lingua::ZH::TaBE::Yin
zuyins()
zhis()
ToYin()
ToZuYinSymbolSequence()
Lingua::ZH::TaBE::ZuYin
yin()
zhi()
Procedural Interface
struct TsiDB *TsiDBOpen(int type, const char *db_name, int flags);
int TsiInfoLookupPossibleTsiYin(struct TsiDB *tsidb,
struct TsiInfo *tsi);
struct TsiYinDB *TsiYinDBOpen(int type, const char *db_name,
int flags);
int ChuInfoToChunkInfo(struct ChuInfo *chu);
int ChunkSegmentationSimplex(struct TsiDB *tsidb,
struct ChunkInfo *chunk);
int ChunkSegmentationComplex(struct TsiDB *tsidb,
struct ChunkInfo *chunk);
int ChunkSegmentationBackward(struct TsiDB *tsidb,
struct ChunkInfo *chunk);
int TsiInfoLookupZhiYin(struct TsiDB *tsidb,
struct TsiInfo *z);
ZhiStr YinLookupZhiList(Yin yin);
ZuYinSymbolSequence YinToZuYinSymbolSequence(Yin yin);
Yin ZuYinSymbolSequenceToYin(ZuYinSymbolSequence str);
const Zhi ZuYinIndexToZuYinSymbol(ZuYinIndex idx);
ZuYinIndex ZuYinSymbolToZuYinIndex(ZuYinSymbol sym);
ZuYinIndex ZozyKeyToZuYinIndex(int key);
int ZhiIsBig5Code(Zhi zhi);
ZhiCode ZhiToZhiCode(Zhi zhi);
Zhi ZhiCodeToZhi(ZhiCode code);
int ZhiCodeToPackedBig5Code(ZhiCode code);
unsigned long int ZhiCodeLookupRefCount(ZhiCode code);
CAVEATS
The TsiYin family of fucntions is yet imcomplete.
SEE ALSO
ftp://xcin.linux.org.tw/pub/xcin/libtabe/devel/
http://libtabe.sourceforge.net/
AUTHORS
Autrijus Tang <autrijus@autrijus.org>
COPYRIGHT
Copyright 2003 by Autrijus Tang <autrijus@autrijus.org>.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
See http://www.perl.com/perl/misc/Artistic.html
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 338:
Non-ASCII character seen before =encoding in '"·í§Ú̦b¹q¸£¤¤³B²z¤¤¤å¸ê°T®É¡A¬Û«H¨ä¤¤³Ì´o¤Hªº".'. Assuming CP1252