The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

UTF8::R2 - provides minimal CP932 I/O subroutines by short name

SYNOPSIS

  use CP932::R2;

    @result = mbeach($utf8str)
    $result = mbtr($utf8str, 'ABC', 'XYZ', 'cdsr')
    $result = iolen($utf8str)
    $result = iomid($utf8expr, $offset_as_cp932, $length_as_cp932, $utf8replacement)
    @result = ioget(FILEHANDLE)
    $result = ioput(FILEHANDLE, @utf8str)
    $result = ioputf(FILEHANDLE, $utf8format, @utf8list)
    @result = iosort(@utf8str)

    $result = $utf8str =~ $mb{qr/$utf8regex/imsxogc}
    $result = $utf8str =~ s<$mb{qr/before/imsxo}><after>egr

MBCS SUBROUTINES for SCRIPTING

It is useful to treat regex in perl script as code point of UTF-8. Following subroutines and tied hash variable provide UTF-8 semantics for us.

  ------------------------------------------------------------------------------------------------------------------------------------------
  Acts as SBCS             Acts as MBCS
  Octet in Script          Octet in Script                             Note and Limitations
  ------------------------------------------------------------------------------------------------------------------------------------------
  // or m// or qr//        $mb{qr/$utf8regex/imsxogc}                  not supports metasymbol \X that match grapheme
                                                                       not support range of codepoint(like an "[A-Z]")
                                                                       not supports POSIX character class (like an [:alpha:])
                                                                       (such as \N{GREEK SMALL LETTER EPSILON}, \N{greek:epsilon}, or \N{epsilon})
                                                                       not supports character properties (like \p{PROP} and \P{PROP})

                           Special Escapes in Regex                    Support Perl Version
                           --------------------------------------------------------------------------------------------------
                           $mb{qr/ \x{Unicode} /}                      since perl 5.006
                           $mb{qr/ [^ ... ] /}                         since perl 5.008  ** CAUTION ** perl 5.006 cannot this
                           $mb{qr/ \h /}                               since perl 5.010
                           $mb{qr/ \v /}                               since perl 5.010
                           $mb{qr/ \H /}                               since perl 5.010
                           $mb{qr/ \V /}                               since perl 5.010
                           $mb{qr/ \R /}                               since perl 5.010
                           $mb{qr/ \N /}                               since perl 5.012

  ------------------------------------------------------------------------------------------------------------------------------------------
  s/before/after/imsxoegr  s<$mb{qr/before/imsxo}><after>egr
  ------------------------------------------------------------------------------------------------------------------------------------------
  split(//,$_)             mbeach($utf8str)                            split $utf8str into each characters
  ------------------------------------------------------------------------------------------------------------------------------------------
  tr/// or y///            mbtr($utf8str, 'ABC', 'XYZ', 'cdsr')        not support range of codepoint(like a "tr/A-Z/a-z/")
  ------------------------------------------------------------------------------------------------------------------------------------------

MBCS SUBROUTINES for I/O

If you use following subroutines then I/O encoding convert is automatically. These subroutines provide CP932 octets semantics for you.

  ------------------------------------------------------------------------------------------------------------------------------------------
  Acts as SBCS             Acts as MBCS
  Octet in Script          Octet of I/O Encoding                       Note and Limitations
  ------------------------------------------------------------------------------------------------------------------------------------------
  <FILEHANDLE>             ioget(FILEHANDLE)                           get UTF-8 codepoint octets from CP932 file
  ------------------------------------------------------------------------------------------------------------------------------------------
  length                   iolen($utf8str)                             octet count of UTF-8 string as CP932 encoding
  ------------------------------------------------------------------------------------------------------------------------------------------
  print                    ioput(FILEHANDLE, @utf8str)                 print @utf8str as CP932 encoding
  ------------------------------------------------------------------------------------------------------------------------------------------
  printf                   ioputf(FILEHANDLE, $utf8format, @utf8list)  printf @utf8str as CP932 encoding
  ------------------------------------------------------------------------------------------------------------------------------------------
  sort                     iosort(@utf8str)                            sort @utf8str as CP932 encoding
  ------------------------------------------------------------------------------------------------------------------------------------------
  sprintf                  (nothing)                                   "iosputf" is bad interface because it makes confuse by bringing
                                                                       both internal code and external code into your script
  ------------------------------------------------------------------------------------------------------------------------------------------
  substr                   iomid($utf8expr, $offset_as_cp932, $length_as_cp932, $utf8replacement)
                                                                       substr $utf8expr as CP932 octets
  ------------------------------------------------------------------------------------------------------------------------------------------

AUTHOR

INABA Hitoshi <ina@cpan.org>

This project was originated by INABA Hitoshi.

LICENSE AND COPYRIGHT

This software is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.

This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.