Lingua::ZH::CEDICT - Interface for CEDICT, a Chinese-English dictionary
use Lingua::ZH::CEDICT; my $dict = Lingua::ZH::CEDICT->new(); $dict->init(); $dict->startMatch('house'); while (my $e = $dict->match()) { # trad simp pinyin pinyin w/o tones english print "$e->[0] $e->[1] [$e->[2] / $e->[3]] $e->[4]\n"; }
Lingua::ZH::CEDICT is an interface for CEDICT.b5, a Chinese-English dictionary file that may be freely used for non-commercial purposes. This is an alpha release; API and features are not finalized. If you intend to use this package, please contact me so I can acommodate your needs.
The dictionary is included as a Storable v2.4 file. Please see the bin/ directory in the distribution to see how to import a new version of the dictionary.
new(%hash) will create a new dictionary object. It accepts the following keys:
new(%hash)
source
(Default: Storable) Type of input for the module. Currently available interfaces are Textfile, Storable and MySQL. See the POD for these modules for details on their configuration.
Textfile
Storable
MySQL
HanConvert
(Default: Lingua::ZH::CEDICT::HanConvert) The module used for the conversion of simple to traditional characters and vice versa.
numEntries()
Returns the number of entries in the dictionary. One entry is a unique (characters, pinyin) pair with english translations.
version()
Returns the version string from the dictionary file used.
entry($number)
Returns the $number entry in the dictionary (0-based, of course).
startMatch($key)
Starts an inexact search using the searchkey $key.
match()
Returns a reference to the next matching entry.
startFind($key)
Starts an exact search using the searchkey $key.
find()
Returns a reference to the next exactly matching entry.
addSimpChar
Call the simple method of the HanConvert module specified to add a conversion to simplified characters to each entry.
simple
applyPinyinFormat($coderef)
Formats the pinyin for all entries. If no code ref is supplied, uses utf8Pinyin.
applyEnglishFormat($coderef)
Formats the English translation for all entries. If no code ref is supplied, uses formatEnglish.
utf8Pinyin($text)
Changes tone numbers to UTF-8-encoded tone marks.
formatEnglish($text)
Changes '/' to a dot as delimiter and HTML-italicizes comments in brackets.
For some applications, a concept of keywords is useful. A keyword is a unique entry in the dictionary. For example, for the pinyin keywords the tonemarks are removed. The keyword "zi" encompasses all translations of a character with the pronunciation zi[1-5].
generateKeywords()
Generate the keywords hashes. Use before you apply formatting.
keysEn()
Return a hash with the keys being the english keywords and the values references to an array of indizes of the entries where the keyword is mentioned.
keysPinyin()
Return a hash with the keys being the pinyin keywords and the values references to an array of indizes of the entries where the same pronunciation is used (without tones).
keysZh()
Return a hash with the keys being the Chinese character keywords and the values references to an array of indizes of the entries where this term is translated. If the data contains both traditional and simplified characters, this hash will include both forms.
Christian Renz, <crenz@web42.com>
Copyright (C) 2002 Christian Renz. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
Lingua::ZH::CEDICT::Textfile Lingua::ZH::CEDICT::Storable Lingua::ZH::CEDICT::MySQL Lingua::ZH::CEDICT::HanConvert http://www.mandarintools.com/cedict.html. http://www.web42.com/zidian/.
1 POD Error
The following errors were encountered while parsing the POD:
You forgot a '=back' before '=head2'
To install Lingua::ZH::CEDICT, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Lingua::ZH::CEDICT
CPAN shell
perl -MCPAN -e shell install Lingua::ZH::CEDICT
For more information on module installation, please visit the detailed CPAN module installation guide.