16 Sep 2014 10:50:38 UTC
- Distribution: Lingua-Han-Utils
- Module version: 0.13
- Source (raw)
- Browse (raw)
- How to Contribute
- Issues (0)
- Testers (1805 / 0 / 0)
- KwaliteeBus factor: 1
- 93.33% Coverage
- License: perl_5
- Activity24 month
- Download (4.66KB)
- MetaCPAN Explorer
- Subscribe to distribution
- This version
- Latest version
Lingua::Han::Utils - The utility tools of Chinese character(HanZi)
use Lingua::Han::Utils qw/Unihan_value csplit cdecode csubstr clength/; # cdecode # the same as decode('cp936', $word) in ASCII editing mode # and decode('utf8', $word) in Unicode editing mode my $word = cdecode($word); # Unihan_value # return the first field of Unihan.txt on unicode.org my $word = "我"; my $unihan = Unihan_value($word); # return '6211' my $words = "爱你"; my @unihan = Unihan_value($word); # return (7231, 4F60) my $unihan = Unihan_value($word); # return 72314F60 # csplit # split the Chinese characters into an array my $words = "我爱你"; my @words = csplit($words); # return ("我", "爱", "你") # csubstr # treat the Chinese characters as one # so it's the same as splice(csplit($words), $offset, $length) my $words = "我爱你啊"; my @words = csubstr($words, 1, 2); # return ("爱", "你") my @words = csubstr($words, 1); # return ("爱", "你", "啊") my $words = csubstr($words, 1, 2); # 爱你 # clength # treat the Chinese character as one my $words = "我爱你"; print clength($words); # 3
Nothing is exported by default.
use Encode::Guess to decode the character. It behavers like: decode('cp936', $word) under ASCII editing mode and decode('utf8', $word) under Unicode editing mode.
the first field of Unihan.txt is the Unicode scalar value as U+[x]xxxx, we return the [x]xxxx.
split the Chinese characters into an array, English words can be mixed in.
- csubstr(WORD, OFFSET, LENGTH)
treat the Chinese character as one word, substr it.
(BE CAFEFUL! it's NOT lvalue, we cann't use csubstr($word, 2, 3) = $REPLACEMENT)
if no LENGTH is specified, substr form OFFSET to END.
treat the Chinese character as one word(length 1).
a Chinese version of document can be found @ http://www.fayland.org/journal/Lingua-Han-Utils.html
<fayland at gmail.com>
Please report any bugs or feature requests to
bug-lingua-han-utils at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Lingua-Han-Utils. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
You can find documentation for this module with the perldoc command.
You can also look for information at:
AnnoCPAN: Annotated CPAN documentation
RT: CPAN's request tracker
the wonderful Encode::Guess
Copyright 2005 Fayland Lam, all rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
Module Install Instructions
To install Lingua::Han::Utils, copy and paste the appropriate command in to your terminal.
perl -MCPAN -e shell install Lingua::Han::Utils
For more information on module installation, please visit the detailed CPAN module installation guide.