Lingua::ZH::HanConvert - convert between Traditional and Simplified Chinese characters
#!perl -lw use Lingua::ZH::HanConvert qw(simple trad); use utf8; my $t = "國"; # Traditional symbol for "country", unicode 22283 # or: my $t = v22283; print simple($t); # Simplified "country", 国 (unicode 22269) $s = "é±¼"; # Simplified symbol for "fish", unicode 40060 # or: $s = v40060; print trad($s); # Traditional "fish", éš (unicode 39970)
Perl 5.6
In the 1950's, the Chinese government simplified over 2000 Chinese characters, to help promote literacy. Taiwan and Hong Kong still use the traditional characters. The simplified characters are hard to read if you only know the traditional ones, and vice-versa.
This module attempts to convert Chinese text between the two forms, using character-by-character transliteration.
Note that this module only handles text in the Unicode UTF-8 character set. If you need to convert between the Big5 and GB character sets, then please look at Text::IConv.
simple takes a string, converts any traditional Chinese characters (such as 國, unicode U+570B, meaning "country") to the corresponding simplified characters (like 国, unicode U+56FD, also meaning "country"), and returns the result. Characters which are not traditional Chinese do not change.
simple
trad does the reverse; it converts any simplified Chinese characters to the corresponding traditional characters. Characters which are not simplified Chinese do not change.
trad
Transliteration is not perfect. At the moment, this module only performs character-by-character transliteration, using the (one-to-one) mappings from the Unicode consortium's Unihan database. Converted text is very imperfect, though it is generally good enough to be readable.
The transliteration mappings could be improved; if anyone knows of another source of mappings then please let me know. Ideally, I'd like to see the module performing word-by-word transliteration, if suitable data sources were available. See http://www.basistech.com/articles/C2C.html for a discussion of transliteration issues.
http://www.basistech.com/articles/C2C.html
The module may take several seconds to initialise. Each subroutine is slow the first time it is run, but is faster when run subsequent times.
The characters in this documentation may not display correctly unless the program you are reading it with is unicode-aware.
The data used by this module is taken from the Unicode consortium's Unihan database, available from ftp://ftp.unicode.org. Thanks to them for compiling the data.
ftp://ftp.unicode.org
David Chan <david@sheetmusic.org.uk>
Copyright (C) 2001, David Chan. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
1 POD Error
The following errors were encountered while parsing the POD:
Non-ASCII character seen before =encoding in '"國";'. Assuming CP1252
To install Lingua::ZH::HanConvert, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Lingua::ZH::HanConvert
CPAN shell
perl -MCPAN -e shell install Lingua::ZH::HanConvert
For more information on module installation, please visit the detailed CPAN module installation guide.