UTF8::R2 - provides minimal CP932NEC I/O subroutines by short name
use CP932NEC::R2; @result = mbeach($utf8str) $result = mbtr($utf8str, 'ABC', 'XYZ', 'cdsr') $result = iolen($utf8str) $result = iomid($utf8expr, $offset_as_cp932nec, $length_as_cp932nec, $utf8replacement) @result = ioget(FILEHANDLE) $result = ioput(FILEHANDLE, @utf8str) $result = ioputf(FILEHANDLE, $utf8format, @utf8list) @result = iosort(@utf8str) $result = $utf8str =~ $mb{qr/$utf8regex/imsxogc} $result = $utf8str =~ s<$mb{qr/before/imsxo}><after>egr
It is useful to treat regex in perl script as code point of UTF-8. Following subroutines and tied hash variable provide UTF-8 semantics for us.
------------------------------------------------------------------------------------------------------------------------------------------ Acts as SBCS Acts as MBCS Octet in Script Octet in Script Note and Limitations ------------------------------------------------------------------------------------------------------------------------------------------ // or m// or qr// $mb{qr/$utf8regex/imsxogc} not supports metasymbol \X that match grapheme not support range of codepoint(like an "[A-Z]") not supports POSIX character class (like an [:alpha:]) (such as \N{GREEK SMALL LETTER EPSILON}, \N{greek:epsilon}, or \N{epsilon}) not supports character properties (like \p{PROP} and \P{PROP}) Special Escapes in Regex Support Perl Version -------------------------------------------------------------------------------------------------- $mb{qr/ \x{Unicode} /} since perl 5.006 $mb{qr/ [^ ... ] /} since perl 5.008 ** CAUTION ** perl 5.006 cannot this $mb{qr/ \h /} since perl 5.010 $mb{qr/ \v /} since perl 5.010 $mb{qr/ \H /} since perl 5.010 $mb{qr/ \V /} since perl 5.010 $mb{qr/ \R /} since perl 5.010 $mb{qr/ \N /} since perl 5.012 ------------------------------------------------------------------------------------------------------------------------------------------ s/before/after/imsxoegr s<$mb{qr/before/imsxo}><after>egr ------------------------------------------------------------------------------------------------------------------------------------------ split(//,$_) mbeach($utf8str) split $utf8str into each characters ------------------------------------------------------------------------------------------------------------------------------------------ tr/// or y/// mbtr($utf8str, 'ABC', 'XYZ', 'cdsr') not support range of codepoint(like a "tr/A-Z/a-z/") ------------------------------------------------------------------------------------------------------------------------------------------
If you use following subroutines then I/O encoding convert is automatically. These subroutines provide CP932NEC octets semantics for you.
------------------------------------------------------------------------------------------------------------------------------------------ Acts as SBCS Acts as MBCS Octet in Script Octet of I/O Encoding Note and Limitations ------------------------------------------------------------------------------------------------------------------------------------------ <FILEHANDLE> ioget(FILEHANDLE) get UTF-8 codepoint octets from CP932NEC file ------------------------------------------------------------------------------------------------------------------------------------------ length iolen($utf8str) octet count of UTF-8 string as CP932NEC encoding ------------------------------------------------------------------------------------------------------------------------------------------ print ioput(FILEHANDLE, @utf8str) print @utf8str as CP932NEC encoding ------------------------------------------------------------------------------------------------------------------------------------------ printf ioputf(FILEHANDLE, $utf8format, @utf8list) printf @utf8str as CP932NEC encoding ------------------------------------------------------------------------------------------------------------------------------------------ sort iosort(@utf8str) sort @utf8str as CP932NEC encoding ------------------------------------------------------------------------------------------------------------------------------------------ sprintf (nothing) "iosputf" is bad interface because it makes confuse by bringing both internal code and external code into your script ------------------------------------------------------------------------------------------------------------------------------------------ substr iomid($utf8expr, $offset_as_cp932nec, $length_as_cp932nec, $utf8replacement) substr $utf8expr as CP932NEC octets ------------------------------------------------------------------------------------------------------------------------------------------
INABA Hitoshi <ina@cpan.org>
This project was originated by INABA Hitoshi.
This software is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.
This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
To install CP932NEC::R2, copy and paste the appropriate command in to your terminal.
cpanm
cpanm CP932NEC::R2
CPAN shell
perl -MCPAN -e shell install CP932NEC::R2
For more information on module installation, please visit the detailed CPAN module installation guide.