NAME

CP932::R2 - provides minimal CP932 I/O subroutines by short name

SYNOPSIS

use CP932::R2;

  @result = mbeach($utf8str)
  $result = mbtr($utf8str, 'ABC', 'XYZ', 'cdsr')
  $result = iolen($utf8str)
  $result = iomid($utf8expr, $offset_as_cp932, $length_as_cp932, $utf8replacement)
  @result = ioget(FILEHANDLE)
  $result = ioput(FILEHANDLE, @utf8str)
  $result = ioputf(FILEHANDLE, $utf8format, @utf8list)
  @result = iosort(@utf8str)

  $result = $utf8str =~ $mb{qr/$utf8regex/imsxo}
  $result = $utf8str =~ m<$mb{qr/$utf8regex/imsxo}>gc
  $result = $utf8str =~ s<$mb{qr/before/imsxo}><after>egr

MBCS SUBROUTINES for SCRIPTING

It is useful to treat regex in perl script as code point of UTF-8. Following subroutines and tied hash variable provide UTF-8 semantics for us.

------------------------------------------------------------------------------------------------------------------------------------------
Acts as SBCS             Acts as MBCS
Octet in Script          Octet in Script                             Note and Limitations
------------------------------------------------------------------------------------------------------------------------------------------
// or m// or qr//        $mb{qr/$utf8regex/imsxo}                    not supports metasymbol \X that match grapheme
                         m<$mb{qr/$utf8regex/imsxo}>gc               not support range of codepoint(like an "[A-Z]")
                                                                     not supports POSIX character class (like an [:alpha:])
                                                                     (such as \N{GREEK SMALL LETTER EPSILON}, \N{greek:epsilon}, or \N{epsilon})
                                                                     not supports character properties (like \p{PROP} and \P{PROP})
                         See UTF8::R2 document for more information
------------------------------------------------------------------------------------------------------------------------------------------
s/before/after/imsxoegr  s<$mb{qr/before/imsxo}><after>egr
------------------------------------------------------------------------------------------------------------------------------------------
split(//,$_)             mbeach($utf8str)                            split $utf8str into each characters
------------------------------------------------------------------------------------------------------------------------------------------
tr/// or y///            mbtr($utf8str, 'ABC', 'XYZ', 'cdsr')        not support range of codepoint(like a "tr/A-Z/a-z/")
------------------------------------------------------------------------------------------------------------------------------------------

MBCS SUBROUTINES for I/O

If you use following subroutines then I/O encoding convert is automatically. These subroutines provide CP932 octets semantics for you.

------------------------------------------------------------------------------------------------------------------------------------------
Acts as SBCS             Acts as MBCS
Octet in Script          Octet of I/O Encoding                       Note and Limitations
------------------------------------------------------------------------------------------------------------------------------------------
<FILEHANDLE>             ioget(FILEHANDLE)                           get UTF-8 codepoint octets from CP932 file
------------------------------------------------------------------------------------------------------------------------------------------
length                   iolen($utf8str)                             octet count of UTF-8 string as CP932 encoding
------------------------------------------------------------------------------------------------------------------------------------------
print                    ioput(FILEHANDLE, @utf8str)                 print @utf8str as CP932 encoding
------------------------------------------------------------------------------------------------------------------------------------------
printf                   ioputf(FILEHANDLE, $utf8format, @utf8list)  printf @utf8str as CP932 encoding
------------------------------------------------------------------------------------------------------------------------------------------
sort                     iosort(@utf8str)                            sort @utf8str as CP932 encoding
------------------------------------------------------------------------------------------------------------------------------------------
sprintf                  (nothing)                                   "iosputf" is bad interface because it makes confuse by bringing
                                                                     both internal code and external code into your script
------------------------------------------------------------------------------------------------------------------------------------------
substr                   iomid($utf8expr, $offset_as_cp932, $length_as_cp932, $utf8replacement)
                                                                     substr $utf8expr as CP932 octets
------------------------------------------------------------------------------------------------------------------------------------------

AUTHOR

INABA Hitoshi <ina@cpan.org>

This project was originated by INABA Hitoshi.

LICENSE AND COPYRIGHT

This software is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.

This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.