NAME
CP932::R2 - provides minimal CP932 I/O subroutines by short name
SYNOPSIS
use
CP932::R2;
@result
= mbeach(
$utf8str
)
$result
= mbtr(
$utf8str
,
'ABC'
,
'XYZ'
,
'cdsr'
)
$result
= iolen(
$utf8str
)
$result
= iomid(
$utf8expr
,
$offset_as_cp932
,
$length_as_cp932
,
$utf8replacement
)
@result
= ioget(FILEHANDLE)
$result
= ioput(FILEHANDLE,
@utf8str
)
$result
= ioputf(FILEHANDLE,
$utf8format
,
@utf8list
)
@result
= iosort(
@utf8str
)
$result
=
$utf8str
=~
$mb
{
qr/$utf8regex/
imsxo}
$result
=
$utf8str
=~ m<
$mb
{
qr/$utf8regex/
imsxo}>gc
$result
=
$utf8str
=~ s<
$mb
{
qr/before/
imsxo}><
after
>egr
MBCS SUBROUTINES for SCRIPTING
It is useful to treat regex in perl script as code point of UTF-8. Following subroutines and tied hash variable provide UTF-8 semantics for us.
------------------------------------------------------------------------------------------------------------------------------------------
Acts as SBCS Acts as MBCS
Octet in Script Octet in Script Note and Limitations
------------------------------------------------------------------------------------------------------------------------------------------
// or m// or
qr//
$mb
{
qr/$utf8regex/
imsxo} not supports metasymbol \X that match grapheme
m<
$mb
{
qr/$utf8regex/
imsxo}>gc not support range of codepoint(like an
"[A-Z]"
)
not supports POSIX character class (like an [:alpha:])
(such as \N{GREEK SMALL LETTER EPSILON}, \N{greek:epsilon}, or \N{epsilon})
not supports character properties (like \p{PROP} and \P{PROP})
See UTF8::R2 document
for
more information
------------------------------------------------------------------------------------------------------------------------------------------
s/
before
/
after
/imsxoegr s<
$mb
{
qr/before/
imsxo}><
after
>egr
------------------------------------------------------------------------------------------------------------------------------------------
split
(//,
$_
) mbeach(
$utf8str
)
split
$utf8str
into
each
characters
------------------------------------------------------------------------------------------------------------------------------------------
tr
/// or y/// mbtr(
$utf8str
,
'ABC'
,
'XYZ'
,
'cdsr'
) not support range of codepoint(like a
"tr/A-Z/a-z/"
)
------------------------------------------------------------------------------------------------------------------------------------------
MBCS SUBROUTINES for I/O
If you use following subroutines then I/O encoding convert is automatically. These subroutines provide CP932 octets semantics for you.
------------------------------------------------------------------------------------------------------------------------------------------
Acts as SBCS Acts as MBCS
Octet in Script Octet of I/O Encoding Note and Limitations
------------------------------------------------------------------------------------------------------------------------------------------
<FILEHANDLE> ioget(FILEHANDLE) get UTF-8 codepoint octets from CP932 file
------------------------------------------------------------------------------------------------------------------------------------------
length
iolen(
$utf8str
) octet count of UTF-8 string as CP932 encoding
------------------------------------------------------------------------------------------------------------------------------------------
ioput(FILEHANDLE,
@utf8str
)
@utf8str
as CP932 encoding
------------------------------------------------------------------------------------------------------------------------------------------
printf
ioputf(FILEHANDLE,
$utf8format
,
@utf8list
)
printf
@utf8str
as CP932 encoding
------------------------------------------------------------------------------------------------------------------------------------------
sort
iosort(
@utf8str
)
sort
@utf8str
as CP932 encoding
------------------------------------------------------------------------------------------------------------------------------------------
sprintf
(nothing)
"iosputf"
is bad interface because it makes confuse by bringing
both internal code and external code into your script
------------------------------------------------------------------------------------------------------------------------------------------
substr
iomid(
$utf8expr
,
$offset_as_cp932
,
$length_as_cp932
,
$utf8replacement
)
substr
$utf8expr
as CP932 octets
------------------------------------------------------------------------------------------------------------------------------------------
AUTHOR
INABA Hitoshi <ina@cpan.org>
This project was originated by INABA Hitoshi.
LICENSE AND COPYRIGHT
This software is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.
This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.