NAME

Jacode4e::RoundTrip - Jacode4e for round-trip conversion in JIS X 0213

SYNOPSIS

  use FindBin;
  use lib "$FindBin::Bin/lib";
  use Jacode4e::RoundTrip;
 
  $return =
  Jacode4e::RoundTrip::convert(\$line, $OUTPUT_encoding, $INPUT_encoding [, { %option }]);
 
    $return
      Number of characters in $line
 
    $line
      String variable to convert
      After conversion, this variable is overwritten
 
    $OUTPUT_encoding, and $INPUT_encoding
      To convert, you must specify both $OUTPUT_encoding and $INPUT_encoding.
      The encodings you can specify are as follows:
 
      mnemonic      means
      -----------------------------------------------------------------------
      cp932x        CP932X, Extended CP932 to JIS X 0213 using 0x9C5A as single shift
      cp00930       IBM CP00930(CP00290+CP00300), CCSID 5026 katakana
      keis78        HITACHI KEIS78
      keis83        HITACHI KEIS83
      keis90        HITACHI KEIS90
      jef           FUJITSU JEF (12 point size for printing with option OUTPUT_SHIFTING)
      jef9p         FUJITSU JEF ( 9 point size for printing with option OUTPUT_SHIFTING)
      jipsj         NEC JIPS(J)
      jipse         NEC JIPS(E)
      letsj         UNISYS LetsJ
      utf8          UTF-8
      utf8jp        UTF-8-SPUA-JP, JIS X 0213 on SPUA ordered by JIS level, plane, row, cell
      -----------------------------------------------------------------------
 
      Round-trip conversion impossible, only for reference
 
      mnemonic      means
      -----------------------------------------------------------------------
      cp932         Microsoft CP932, IANA Windows-31J
      sjis2004      JISC Shift_JIS-2004
      -----------------------------------------------------------------------
 
    %option
      The options you can specify are as follows:
 
      key mnemonic      value means
      -----------------------------------------------------------------------
      INPUT_LAYOUT      input record layout by 'S' and 'D' sequence
                        'S' means one char as SBCS, 'D' means one char as DBCS
      OUTPUT_SHIFTING   true means use output shift code, false means not use
                        default is false
      SPACE             output space code in DBCS/MBCS
      GETA              output geta code in DBCS/MBCS
      OVERRIDE_MAPPING  hash reference of FROM => TO override mapping
                        { "\x12\x34"=>"\x56\x78", "\x9A\xBC"=>"\xDE\xFE", }
                        (CAUTION! override also SPACE option)
      -----------------------------------------------------------------------

SAMPLE

  use FindBin;
  use lib "$FindBin::Bin/lib";
  use Jacode4e::RoundTrip;
  Jacode4e::RoundTrip::VERSION('2.13.81.3');
  while (<>) {
      $return =
      Jacode4e::RoundTrip::convert(\$_, 'cp932x', 'cp00930', {
          'INPUT_LAYOUT'     => 'SSSDDDSSDDSDSD',
          'OUTPUT_SHIFTING'  => 0,
          'SPACE'            => "\x81\xA2",
          'GETA'             => "\x81\xA1",
          'OVERRIDE_MAPPING' => { "\x44\x5A" => "\x81\x7C", },
      });
      print $_;
  }

INPUT SI/SO code

  Wikipedia tells us Kanji shift code of each encoding of vendors.
  Jacode4e::RoundTrip::convert() handle SI/SO(Shift In and Shift Out) code in
  $line automatically. If $line has no SI/SO code, we can use option
  INPUT_LAYOUT instead of SI/SO code.
  Actually saying, we have to use option INPUT_LAYOUT almost always, if
  $INPUT_encoding is any of enterprise encodings.
  
  ---------------------------------------------------------------------------
                     SO(Shift Out)       SI(Shift In)
  $INPUT_encoding    KI(KANJI In)        KO(KANJI Out)
  mnemonic           switch to DBCS      switch to SBCS    note
  ---------------------------------------------------------------------------
  'cp932x'           (nothing)           (nothing)         
  'cp932'            (nothing)           (nothing)         
  'sjis2004'         (nothing)           (nothing)         
  'cp00930'          "\x0E"              "\x0F"            
  'keis78'           "\x0A\x42"          "\x0A\x41"        
  'keis83'           "\x0A\x42"          "\x0A\x41"        
  'keis90'           "\x0A\x42"          "\x0A\x41"        
  'jef'              "\x28" or "\x38"    "\x29"            both 12 and 9 point size are ok
  'jef9p'            "\x28" or "\x38"    "\x29"            both 12 and 9 point size are ok
  'jipsj'            "\x1A\x70"          "\x1A\x71"        
  'jipse'            "\x3F\x75"          "\x3F\x76"        
  'letsj'            "\x93\x70"          "\x93\xF1"        
  'utf8'             (nothing)           (nothing)         
  'utf8jp'           (nothing)           (nothing)         
  ---------------------------------------------------------------------------

OUTPUT SI/SO code

  Jacode4e::RoundTrip::convert() doesn't output SI/SO code on default. Thus,
  if you need SI/SO code then you have to use option 'OUTPUT_SHIFTING' => 1.
  
  ---------------------------------------------------------------------------
                     SO(Shift Out)       SI(Shift In)
  $OUTPUT_encoding   KI(KANJI In)        KO(KANJI Out)
  mnemonic           switch to DBCS      switch to SBCS    %option
  ---------------------------------------------------------------------------
  'cp932x'           (nothing)           (nothing)         
  'cp932'            (nothing)           (nothing)         
  'sjis2004'         (nothing)           (nothing)         
  'cp00930'          "\x0E"              "\x0F"            'OUTPUT_SHIFTING' => 1
  'keis78'           "\x0A\x42"          "\x0A\x41"        'OUTPUT_SHIFTING' => 1
  'keis83'           "\x0A\x42"          "\x0A\x41"        'OUTPUT_SHIFTING' => 1
  'keis90'           "\x0A\x42"          "\x0A\x41"        'OUTPUT_SHIFTING' => 1
  'jef'              "\x28"              "\x29"            'OUTPUT_SHIFTING' => 1
  'jef9p'            "\x38"              "\x29"            'OUTPUT_SHIFTING' => 1
  'jipsj'            "\x1A\x70"          "\x1A\x71"        'OUTPUT_SHIFTING' => 1
  'jipse'            "\x3F\x75"          "\x3F\x76"        'OUTPUT_SHIFTING' => 1
  'letsj'            "\x93\x70"          "\x93\xF1"        'OUTPUT_SHIFTING' => 1
  'utf8'             (nothing)           (nothing)         
  'utf8jp'           (nothing)           (nothing)         
  ---------------------------------------------------------------------------

OUTPUT DBCS/MBCS SPACE code

  The default space code is as follows.
  You can change the space code using the option 'SPACE' if you want.
  
  ---------------------------------------------------------------------------
  $OUTPUT_encoding
  mnemonic           default code        %option
  ---------------------------------------------------------------------------
  'cp932x'           "\x81\x40"          
  'cp932'            "\x81\x40"          
  'sjis2004'         "\x81\x40"          'SPACE' => "\x20\x20" for CP/M-86 compatible
  'cp00930'          "\x40\x40"          
  'keis78'           "\xA1\xA1"          
  'keis83'           "\xA1\xA1"          
  'keis90'           "\xA1\xA1"          
  'jef'              "\xA1\xA1"          'SPACE' => "\x40\x40" for 99FR-0012-2 and 99FR-0012-3 compatible
  'jef9p'            "\xA1\xA1"          'SPACE' => "\x40\x40" for 99FR-0012-2 and 99FR-0012-3 compatible
  'jipsj'            "\x21\x21"          
  'jipse'            "\x4F\x4F"          
  'letsj'            "\x20\x20"          'SPACE' => "\xA1\xA1" for EUC-JP like space
  'utf8'             "\xE3\x80\x80"      
  'utf8jp'           "\xF3\xB0\x84\x80"  
  ---------------------------------------------------------------------------

OUTPUT DBCS/MBCS GETA code

  If a character isn't included in $OUTPUT_encoding set, GETA code will be
  used instead of converted code.
  
  The default GETA code is as follows.
  You can change GETA code using option 'GETA' if you want.
  
  "GETA" doesn't mean "GETA", but means "GETA-MARK".
  
  GETA is Japanese wooden shoes that made for walk on paddy field. One GETA
  has two teeth, and they make GETA-MARK on the ground by bite the earth
  twice. Thus, GETA code is double byte code, or often multibyte code.
  
  ---------------------------------------------------------------------------
  $OUTPUT_encoding
  mnemonic           default code        %option sample
  ---------------------------------------------------------------------------
  'cp932x'           "\x81\xAC"          'GETA' => "\x81\xA1"
  'cp932'            "\x81\xAC"          'GETA' => "\x81\x9C"
  'sjis2004'         "\x81\xAC"          'GETA' => "\x81\xFC"
  'cp00930'          "\x44\x7D"          
  'keis78'           "\xA2\xAE"          
  'keis83'           "\xA2\xAE"          
  'keis90'           "\xA2\xAE"          
  'jef'              "\xA2\xAE"          
  'jef9p'            "\xA2\xAE"          
  'jipsj'            "\x22\x2E"          
  'jipse'            "\x7F\x4B"          
  'letsj'            "\xA2\xAE"          
  'utf8'             "\xE3\x80\x93"      
  'utf8jp'           "\xF3\xB0\x85\xAB"  
  ---------------------------------------------------------------------------

RAISON D'ETRE

 This software has been developed for use promotion of JIS X 0213.
 
 Jacode4e::RoundTrip module can round-trip convert JIS X 0213 characters
 in Japanese main-frames or enterprise servers each other, using its
 user-defined area.
 
 The encodings that can be round-trip converted are cp932x, cp00930,
 keis78, keis83, keis90, jef, jef9p, jipsj, jipse, letsj, utf8, and utf8jp.
 
 This table shows shortage to support JIS X 0213 in each encoding
 ---------------------------------------------------------------------
                        Jacode4e  short-  user-def.  unused    free
 mnemonic              supported     age       area    area    area
 ---------------------------------------------------------------------
 cp932x                   11,285       0         --      --      --
 cp00930                  11,257      28      1,880      --   1,880
 keis78, keis83, keis90    8,268   3,017      2,914     188   3,102
 jef, jef9p                8,814   2,471      3,102      --   3,102
 jipsj, jipse              8,637   2,648      3,948      --   3,948
 letsj                     9,876   1,409      2,632      --   2,632
 utf8                     11,220      65      6,400      --      65
 utf8jp                   11,285       0         --      --      --
 ---------------------------------------------------------------------
 
 cp00930 use 28 code points from its user-defined area. Similarly, keis78,
 keis83, and keis90 use 3,017, jef and jef9p use 2,471, jipsj and jipse
 use 2,648, letsj use 1,409, utf8 use 65. In case of KEIS, user-defined
 area is not enough to support JIS X 0213. So I decided to use unused
 area.
 
 Yes, you cannot use your gaiji on Jacode4e::RoundTrip module, you see.
 
 Jacode4e::RoundTrip module doesn't work as Jacode4e. And Jacode4e module
 doesn't work as Jacode4e::RoundTrip, too.
 
 This software is useful for processing your JIS X 0213 data by other
 system, and importing it again into your system.

WHAT IS "CP932X"?

  • "cp932x" as mnemonic

  • CP932X is CP932

  • Pronounce [si: pi: nain thri: tu: kai] in English

  • Pronounce [shi: pi: kju: san' ni kai] in Japanese

  • [si: pi: nain thri: tu: iks] is reserved for Microsoft Corporation ;-P

  • CP932 upper compatible

  • Supports JIS X 0213 character set

  • Used ghost character "\x9C\x5A" as single shift code

  • Used "\x9C\x5A\x9C\x5A" for single "\x9C\x5A"

  • You can use private use characters you made

  • You can use your operating system, network, and database.

  • In most cases, application programs can be used as it is.

WHAT IS "UTF-8-SPUA-JP"?

  • "utf8jp" as mnemonic

  • UTF-8-SPUA-JP is UTF-8

  • Internal character encoding of Jacode4e and Jacode4e::RoundTrip, universally

  • Implements JIS X 0213 character set on to Unicode Supplementary Private Use Area-A

  • Code point ordered by JIS level, plane, row, cell

  • Uniformly length encoding

  • No grapheme clustering, one character by uniquely code point

DEPENDENCIES

This software requires perl version 5.00503 or later to run. (All of Perl4 users in the world, pardon me!)

SOFTWARE LIFE CYCLE

                                         Jacode.pm
                    jcode.pl  Encode.pm  jacode.pl  Jacode4e  Jacode4e::RoundTrip
  --------------------------------------------------------------------------------
  1993 Perl4.036       |                     |                                    
    :     :            :                     :                                    
  1999 Perl5.00503     |                     |         |               |          
  2000 Perl5.6         |                     |         |               |          
  2002 Perl5.8         |         Born        |         |               |          
  2007 Perl5.10        V          |          |         |               |          
  2010 Perl5.12       EOL         |         Born       |               |          
  2011 Perl5.14                   |          |         |               |          
  2012 Perl5.16                   |          |         |               |          
  2013 Perl5.18                   |          |         |               |          
  2014 Perl5.20                   |          |         |               |          
  2015 Perl5.22                   |          |         |               |          
  2016 Perl5.24                   |          |         |               |          
  2017 Perl5.26                   |          |         |               |          
  2018 Perl5.28                   |          |        Born            Born        
  2019 Perl5.30                   |          |         |               |          
  2020 Perl5.32                   :          :         :               :          
  2030 Perl5.52                   :          :         :               :          
  2040 Perl5.72                   :          :         :               :          
  2050 Perl5.92                   :          :         :               :          
  2060 Perl5.112                  :          :         :               :          
  2070 Perl5.132                  :          :         :               :          
  2080 Perl5.152                  :          :         :               :          
  2090 Perl5.172                  :          :         :               :          
  2100 Perl5.192                  :          :         :               :          
  2110 Perl5.212                  :          :         :               :          
  2120 Perl5.232                  :          :         :               :          
    :     :                       V          V         V               V          
  --------------------------------------------------------------------------------

SOFTWARE COVERAGE

When you lost your way, you can see this matrix and find your way.

  Skill/Use  Amateur    Semipro    Pro        Enterprise  Enterprise(round-trip)
  -------------------------------------------------------------------------------
  Expert     jacode.pl  Encode.pm  Encode.pm  Jacode4e    Jacode4e::RoundTrip
  -------------------------------------------------------------------------------
  Middle     jacode.pl  jacode.pl  Encode.pm  Jacode4e    Jacode4e::RoundTrip
  -------------------------------------------------------------------------------
  Beginner   jacode.pl  jacode.pl  jacode.pl  Jacode4e    Jacode4e::RoundTrip
  -------------------------------------------------------------------------------

Why CP932X born?

In order to know why CP932X exists the way it is(or isn't), one must first know why CP932X born.

  Q1) Is CCS of JIS X 0208 enough?
  A1) No. Often we require GAIJI.
  
  Q2) Is CCS of JIS X 0213 enough?
  A2) It's not perfect, but enough for many people.
  
  Q3) Is CES by UTF-8 good?
  A3) No. In Japanese information processing, it's unstable and not popular still now.
  
  Q4) Is CES by Shift_JIS-2004 good?
  A4) No. Because Shift_JIS-2004 cannot support very popular CP932 and your GAIJI. We need a realistic solution to solving real problem.
  
  Q5) Is escape sequence good idea to support CCS of JIS X 0213?
  A5) No. Because the programming is so hard.
  
  Q6) Which character is best as single shift code to support CCS of JIS X 0213?
    -- The single shift code must be a DBCS code, because DBCS field cannot store SBCS code in some cases
    -- Moreover, all GAIJI code points must be yours
    -- The impact of this solution must be minimum
  A6) I select 1-55-27 as single shift code. It is ghost character and not used by nobody.

AUTHOR

INABA Hitoshi <ina@cpan.org> in a CPAN

This project was originated by INABA Hitoshi.

LICENSE AND COPYRIGHT

This software is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.

This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

SEE ALSO

 CPGID 00290
 https://www-01.ibm.com/software/globalization/cdra/
 https://www-01.ibm.com/software/globalization/cp/cp00290.html
 ftp://ftp.software.ibm.com/software/globalization/gcoc/attachments/CP00290.pdf
 ftp://ftp.software.ibm.com/software/globalization/gcoc/attachments/CP00290.txt

 HiRDB Datareplicator Version 8 manuals, Hitachi, Ltd.
 http://itdoc.hitachi.co.jp/manuals/3020/3020636050/W3600001.HTM
 http://itdoc.hitachi.co.jp/manuals/3020/3020636050/W3600166.HTM
 http://itdoc.hitachi.co.jp/manuals/3020/30203J3820/ISUS0268.HTM
 http://itdoc.hitachi.co.jp/manuals/3000/30003D5820/CLNT0235.HTM

 Linkexpress, FUJITSU LIMITED
 http://software.fujitsu.com/jp/manual/manualfiles/M080093/J2X15930/03Z200/index.html
 http://software.fujitsu.com/jp/manual/manualfiles/M080093/J2X15930/03Z200/unyo05/unyo0413.html
 http://software.fujitsu.com/jp/manual/manualfiles/m130010/b1fw5992/01z200/b5992-c-00-00.html

 iDIVO Ver.1.4.0
 https://www.hulft.com/shukka/files/iDIVO/SP-DV1-CC-02-01.pdf

 cp932 to Unicode table
 ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT
 https://support.microsoft.com/ja-jp/help/170559/prb-conversion-problem-between-shift-jis-and-unicode

 Shift_JIS-2004 to Unicode table
 http://x0213.org/codetable/sjis-0213-2004-std.txt

 IBM Japanese Graphic Character Set, Kanji DBCS Host and DBCS - PC
 https://www-01.ibm.com/software/globalization/cdra/
 ftp://ftp.software.ibm.com/software/globalization/gcoc/attachments/CP00300.pdf

 IBM Kanji code list
 N:GC18-2040-3

 KEIS code book, Culti Co.,Ltd.
 http://www.culti.co.jp/2016/02/01/%e3%82%ab%e3%83%ab%e3%83%81%e7%99%ba%e8%a1%8c%e6%9b%b8%e7%b1%8d/

 JIS X 0208 (1990) to Unicode
 ftp://ftp.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/JIS0208.TXT

 Appendix B.2 Character code differences
 http://itdoc.hitachi.co.jp/manuals/3020/3020759580/G5950334.HTM

 Appendix E Handling of character codes in PDE - Form Designer (applies only to distributed type PDE)
 http://itdoc.hitachi.co.jp/manuals/3020/30203p0360/PDEF0203.HTM

 HITAC Character code table (KEIS83)
 Document number 8080-2-100-10

 JEF code book, Culti Co.,Ltd.
 http://www.culti.co.jp/2016/02/01/jef%e3%82%b3%e3%83%bc%e3%83%89%e3%83%96%e3%83%83%e3%82%af/

 Linkexpress operation manual J2X1-5930-03Z2(00) FUJITSU LIMITED
 http://software.fujitsu.com/jp/manual/manualfiles/M080093/J2X15930/03Z200/index.html
 http://software.fujitsu.com/jp/manual/manualfiles/M070086/J2X15930/01Z200/unyo05/unyo0416.html
 http://software.fujitsu.com/jp/manual/manualfiles/M070086/J2X15930/01Z200/unyo05/unyo0420.html
 http://software.fujitsu.com/jp/manual/manualfiles/M070086/J2X15930/01Z200/unyo05/unyo0421.html
 http://software.fujitsu.com/jp/manual/manualfiles/m120010/b1fw5691/05z200/index.html
 http://software.fujitsu.com/jp/manual/manualfiles/m120010/b1fw5691/05z200/index.html
 http://software.fujitsu.com/jp/manual/manualfiles/m120010/b1fw5691/05z200/index.html
 http://software.fujitsu.com/jp/manual/manualfiles/m120010/b1fw5691/05z200/index.html
 http://software.fujitsu.com/jp/manual/manualfiles/m120010/b1fw5691/05z200/b5691-g-00-00.html

 hidekatsu-izuno/jef4j
 https://github.com/hidekatsu-izuno/jef4j

 JHTc(JHT command edition)
 http://www.vector.co.jp/soft/winnt/util/se094205.html

 FACOM JEF Character code index dictionary
 Manual code 99FR-0012-3

 JIPS code book, Culti Co.,Ltd.
 http://www.culti.co.jp/2016/02/01/jips%e3%82%b3%e3%83%bc%e3%83%89%e3%83%96%e3%83%83%e3%82%af/

 NEC Corporation Standard character set dictionary <BASIC>
 ZBB10-3

 NEC Corporation Standard character set dictionary <EXTENSION>
 ZBB11-2

 ClearPath Enterprise Servers MultiLingual System Administration, Operations, and Programming Guide ClearPath MCP 15.0 April 2013 8600 0288-308
 https://public.support.unisys.com/aseries/docs/ClearPath-MCP-16.0/PDF/86000288-308.pdf

 Heterogeneous database cooperation among heterogeneous OS environments
 http://www.unisys.co.jp/tec_info/tr56/5605.htm

 UTF-8, a transformation format of ISO 10646
 https://www.rfc-editor.org/rfc/rfc3629.txt

 Kanji shift code
 https://ja.wikipedia.org/wiki/%E6%BC%A2%E5%AD%97%E3%82%B7%E3%83%95%E3%83%88%E3%82%B3%E3%83%BC%E3%83%89

 Very old fj.kanji discussion
 http://www.ie.u-ryukyu.ac.jp/~kono/fj/fj.kanji/index.html

 BackPAN
 http://backpan.perl.org/authors/id/I/IN/INA/

ACKNOWLEDGEMENTS

 I could make this software by good luck. I thank all stakeholders.

 I received character code table of KEIS, JEF, and JIPS by electronic data
 from Culti Co.,Ltd. Moreover, Culti Co.,Ltd. has allowed me to use it to
 make open source software.

 I thank Culti Co.,Ltd. once again.

HELLO WORLD

 To support JIS X 0213:2004,
 
     Using ghost character 1-55-27(it's me!),
 
 Found by JIS X 0208:1997,
 
     Was born in JIS C 6226-1978.
 
 Hello world,
 
     What do we hack, today?
 
                -- 1-55-27, 2018-01-27