jacode - Perl library for Japanese character code conversion
require 'jacode.pl'; # note: You can use either of the package of 'jcode' and 'jacode' jacode::convert(\$line, $OUTPUT_encoding [, $INPUT_encoding [, $option]]) jacode::xxx2yyy(\$line [, $option]) jacode::to($OUTPUT_encoding, $line [, $INPUT_encoding [, $option]]) jacode::jis($line [, $INPUT_encoding [, $option]]) jacode::euc($line [, $INPUT_encoding [, $option]]) jacode::sjis($line [, $INPUT_encoding [, $option]]) jacode::utf8($line [, $INPUT_encoding [, $option]]) jacode::jis_inout($JIS_Kanji_IN, $ASCII_IN) jacode::get_inout($line) jacode::h2z_xxx(\$line) jacode::z2h_xxx(\$line) jacode::getcode(\$line) jacode::init() # Perl4 INTERFACE for jcode.pl users &jcode'getcode_utashiro_2000_09_29(*line) &jcode'getcode(*line) &jcode'convert(*line, $OUTPUT_encoding [, $INPUT_encoding [, $option]]) &jcode'xxx2yyy(*line [, $option]) &jcode'to($OUTPUT_encoding, $line [, $INPUT_encoding [, $option]]) &jcode'jis($line [, $INPUT_encoding [, $option]]) &jcode'euc($line [, $INPUT_encoding [, $option]]) &jcode'sjis($line [, $INPUT_encoding [, $option]]) &jcode'utf8($line [, $INPUT_encoding [, $option]]) &jcode'jis_inout($JIS_Kanji_IN, $ASCII_IN) &jcode'get_inout($line) &jcode'cache() &jcode'nocache() &jcode'flushcache() &jcode'flush() &jcode'h2z_xxx(*line) &jcode'z2h_xxx(*line) &jcode'tr(*line, $from, $to [, $option]) &jcode'trans($line, $from, $to [, $option]) &jcode'init() $jcode'convf{'xxx', 'yyy'} $jcode'z2hf{'xxx'} $jcode'h2zf{'xxx'} # Perl5 INTERFACE for jcode.pl users jcode::getcode_utashiro_2000_09_29(\$line) jcode::getcode(\$line) jcode::convert(\$line, $OUTPUT_encoding [, $INPUT_encoding [, $option]]) jcode::xxx2yyy(\$line [, $option]) jcode::to($OUTPUT_encoding, $line [, $INPUT_encoding [, $option]]) jcode::jis($line [, $INPUT_encoding [, $option]]) jcode::euc($line [, $INPUT_encoding [, $option]]) jcode::sjis($line [, $INPUT_encoding [, $option]]) jcode::utf8($line [, $INPUT_encoding [, $option]]) jcode::jis_inout($JIS_Kanji_IN, $ASCII_IN) jcode::get_inout($line) jcode::cache() jcode::nocache() jcode::flushcache() jcode::flush() jcode::h2z_xxx(\$line) jcode::z2h_xxx(\$line) jcode::tr(\$line, $from, $to [, $option]) jcode::trans($line, $from, $to [, $option]) jcode::init() &{$jcode::convf{'xxx', 'yyy'}}(\$line) &{$jcode::z2hf{'xxx'}}(\$line) &{$jcode::h2zf{'xxx'}}(\$line)
This software has upper compatibility to jcode.pl and multiple inheritance both stable jcode.pl library and active Encode module.
'Ja' is a meaning of 'Japanese' in ISO 639-1 code and is unrelated to 'JA Group Organization'.
The code conversion from 'sjis' to 'utf8' is done by using following table.
http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT
From 'utf8' to 'sjis' is done by using the CP932.TXT and following table.
PRB: Conversion Problem Between Shift-JIS and Unicode
http://support.microsoft.com/kb/170559/en-us
What's this software good for ...
jcode.pl upper compatible
pkf command upper compatible
Perl4 script also Perl5 script
Powered by Encode::from_to (Yes, not only Japanese!)
Support HALFWIDTH KATAKANA
Support UTF-8
Hidden UTF8 flag
No object-oriented programming
Possible to re-use past code and how to
This software requires perl 4.036 or later.
Convert the contents of $line to the specified Japanese encoding given in the second argument $OUTPUT_encoding. $OUTPUT_encoding can be any of "jis", "sjis", "euc" or "utf8", or use "noconv" when you don't want the encoding conversion. Input encoding is recognized semi-automatically from the $line itself when $INPUT_encoding is not supplied. It is better to specify $INPUT_encoding, since jacode::getcode's guess is not always right. xxx2yyy routine is more efficient when both codes are known. It returns the encoding of input string in scalar context, and a list of pointer of convert subroutine and the input encoding in array context. Japanese character encoding JIS X0201, X0208, X0212 and ASCII code are supported. JIS X0212 characters can not be represented in sjis or utf8 and they will be replased by "geta" character when converted to sjis. JIS X0213 characters can not be represented in all. For perl is 5.8.1 or later, jacode::convert acts as a wrapper to Encode::from_to. When $OUTPUT_encoding or $INPUT_encoding is neither "jis", "sjis", "euc" nor "utf8", and Encode module can be used, Encode::from_to( $line, $INPUT_encoding, $OUTPUT_encoding ) is executed instead of jacode::convert(\$line, $OUTPUT_encoding, $INPUT_encoding, $option). In this case, there is no effective return value of pointer of convert subroutine in array context. Fourth $option parameter is just forwarded to conversion routine. See next paragraph for detail.
Convert the Japanese code from xxx to yyy. String xxx and yyy are any convination from "jis", "euc", "sjis" or "utf8". They return *approximate* number of converted bytes. So return value 0 means the line was not converted at all. Optional parameter $option is used to specify optional conversion method. String "z" is for JIS X0201 KANA to JIS X0208 KANA, and "h" is for reverse.
These functions are prepared for easy use of call/return-by-value interface. You can use these funcitons in s///e operation or any other place for convenience.
Set or inquire JIS Kanji start and ASCII start sequences. Default is "ESC-$-B" and "ESC-(-B". "ASCII start" is used instead of "JIS Kanji OUT". If specified in the short form of one character, and is set by being converted into full sequence. ----------------------------------------------- short full sequence means ----------------------------------------------- @ ESC-$-@ JIS C 6226-1978 B ESC-$-B JIS X 0208-1983 & ESC-&@-ESC-$-B JIS X 0208-1990 O ESC-$-(-O JIS X 0213:2000 plane1 Q ESC-$-(-Q JIS X 0213:2004 plane1 -----------------------------------------------
Get JIS Kanji start and ASCII start sequences from $line.
JIS X0201 KANA (so-called Hankaku-KANA) to JIS X0208 KANA (Zenkaku-KANA) code conversion routine. String xxx is any of "jis", "sjis", "euc" and "utf8". From the difficulty of recognizing code set from 1-byte KATAKANA string, automatic code recognition is not supported.
JIS X0208 to JIS X0201 KANA code conversion routine. String xxx is any of "jis", "sjis", "euc" and "utf8".
Return 'jis', 'sjis', 'euc', 'utf8' or undef according to Japanese character code in $line. Return 'binary' if the data has non-character code. When evaluated in array context, it returns a list contains two items. First value is the number of characters which matched to the expected code, and second value is the code name. It is useful if and only if the number is not 0 and the code is undef; that case means it couldn't tell 'euc' or 'sjis' because the evaluation score was exactly same. This interface is too tricky, though. Code detection between euc and sjis is very difficult or sometimes impossible or even lead to wrong result when it includes JIS X0201 KANA characters.
Initialize the variables used in this package. You don't have to call this when using jocde.pl by `do' or `require' interface. Call it first if you embedded the jacode.pl at the end of your script.
Original &getcode() of jcode.pl.
Usually, converted character is cached in memory to avoid same calculations have to be done many times. To disable this caching, call jacode::nocache(). It can be revived by jacode::cache() and cache is flushed by calling jacode::flushcache(). jacode::cache() and jacode::nocache() functions return previous caching state. jacode::flush() is an alias of jacode::flushcache() to save old documents.
jacode::tr emulates tr operator for 2 byte code. Only 'd' is interpreted as an option. Range operator like `A-Z' for 2 byte code is partially supported. Code must be JIS or EUC-JP, and first byte have to be same on first and last character. CAUTION: Handling range operator is a kind of trick and it is not perfect. So if you need to transfer `-' character, please be sure to put it at the beginning or the end of $from and $to strings.
Same as jacode::tr but accept string and return string after translation.
The value of this associative array is pointer to the subroutine jacode::xxx2yyy().
These are pointer to the corresponding function just as $jacode::convf.
Current jacode.pl is written in Perl 4 but it is possible to use from Perl 5 using `references'. Fully perl5 capable version is future issue.
Since lexical variable is not a subject of typeglob, *string style call doesn't work if the variable is declared as `my'. Same thing happens to special variable $_ if the perl is compiled to use thread capability. So using reference is generally recommented to avoid the mysterious error.
Convert SJIS to JIS and print each line with code name
#require 'jcode.pl'; require 'jacode.pl'; while (defined($s = <>)) { $code = &jcode'convert(*s, 'jis', 'sjis'); print $code, "\t", $s; }
Convert all lines to JIS according to the first recognized line
#require 'jcode.pl'; require 'jacode.pl'; while (defined($s = <>)) { print, next unless $s =~ /[\x1b\x80-\xff]/; (*f, $INPUT_encoding) = &jcode'convert(*s, 'jis'); print; defined(&f) || next; while (<>) { &f(*s); print; } last; }
The safest way of JIS conversion
#require 'jcode.pl'; require 'jacode.pl'; while (defined($s = <>)) { ($matched, $INPUT_encoding) = &jcode'getcode(*s); if (@buf == 0 && $matched == 0) { print $s; next; } push(@buf, $s); next unless $INPUT_encoding; while (defined($s = shift(@buf))) { &jcode'convert(*s, 'jis', $INPUT_encoding); print $s; } while (defined($s = <>)) { &jcode'convert(*s, 'jis', $INPUT_encoding); print $s; } last; } print @buf if @buf;
Convert SJIS to UTF-8 and print each line by perl 4.036 or later
#retire 'jcode.pl'; require 'jacode.pl'; while (defined($s = <>)) { &jcode'convert(*s, 'utf8', 'sjis'); print $s; }
Convert SJIS to UTF16-BE and print each line by perl 5.8.1 or later
require 'jacode.pl'; use 5.8.1; while (defined($s = <>)) { jacode::convert(\$s, 'UTF16-BE', 'sjis'); print $s; }
Convert SJIS to MIME-Header-ISO_2022_JP and print each line by perl 5.8.1 or later
require 'jacode.pl'; use 5.8.1; while (defined($s = <>)) { jacode::convert(\$s, 'MIME-Header-ISO_2022_JP', 'sjis'); print $s; }
Traditional style of file I/O
require 'jacode.pl'; open(FILE,'input.txt'); while (<FILE>) { chomp; jacode::convert(\$_,'sjis','utf8'); ... }
Minimalist style
open(FILE,'perl jacode.pl -ws input.txt | ');
You must use -Llatin switch if you use on the JPerl.
I have tested and verified this software using the best of my ability. However, a software containing much code is bound to contain some bugs. Thus, if you happen to find a bug that's in jacode.pl and not your own program, you can try to reduce it to a minimal test case and then report it to the following author's address. If you have an idea that could make this a more useful tool, please let everyone share it.
This project was originated by Kazumasa Utashiro <utashiro@iij.ad.jp>.
This software is free software;
Copyright (c) 2010, 2011, 2014, 2015, 2016, 2017, 2018 INABA Hitoshi <ina@cpan.org>> in a CPAN
The latest version is available here:
http://search.cpan.org/dist/jacode/
*** ATTENTION *** This software is not "jcode.pl" Thus don't redistribute this software renaming as "jcode.pl" Moreover, this software IS NOT "jacode4e.pl" If you want "jacode4e.pl", search it on CPAN again.
Original version `jcode.pl' is ...
Copyright (c) 2002 Kazumasa Utashiro http://web.archive.org/web/20090608090304/http://srekcah.org/jcode/
Copyright (c) 1995-2000 Kazumasa Utashiro <utashiro@iij.ad.jp> Internet Initiative Japan Inc. 3-13 Kanda Nishiki-cho, Chiyoda-ku, Tokyo 101-0054, Japan
Copyright (c) 1992,1993,1994 Kazumasa Utashiro Software Research Associates, Inc.
Use and redistribution for ANY PURPOSE are granted as long as all copyright notices are retained. Redistribution with modification is allowed provided that you make your modified version obviously distinguishable from the original one. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES ARE DISCLAIMED.
This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Original version was developed under the name of srekcah@sra.co.jp February 1992 and it was called kconv.pl at the beginning. This address was a pen name for group of individuals and it is no longer valid.
ftp://ftp.iij.ad.jp/pub/IIJ/dist/utashiro/perl/
UNIX MAGAZINE 1992 Apr Pages: 148 T1008901040810 ZASSHI 08901-4 http://ascii.asciimw.jp/books/books/detail/978-4-7561-5008-0.shtml Programming Perl, Second Edition By Larry Wall, Tom Christiansen, Randal L. Schwartz October 1996 Pages: 670 ISBN 10: 1-56592-149-6 | ISBN 13: 9781565921498 http://shop.oreilly.com/product/9781565921498.do Programming Perl, Third Edition By Larry Wall, Tom Christiansen, Jon Orwant Third Edition July 2000 Pages: 1104 ISBN 10: 0-596-00027-8 | ISBN 13: 9780596000271 http://shop.oreilly.com/product/9780596000271.do Programming Perl, 4th Edition By: Tom Christiansen, brian d foy, Larry Wall, Jon Orwant Publisher: O'Reilly Media Formats: Print, Ebook, Safari Books Online Print: January 2012 Ebook: March 2012 Pages: 1130 Print ISBN: 978-0-596-00492-7 | ISBN 10: 0-596-00492-3 Ebook ISBN: 978-1-4493-9890-3 | ISBN 10: 1-4493-9890-1 http://shop.oreilly.com/product/9780596004927.do Perl Cookbook, Second Edition By Tom Christiansen, Nathan Torkington Second Edition August 2003 Pages: 964 ISBN 10: 0-596-00313-7 | ISBN 13: 9780596003135 http://shop.oreilly.com/product/9780596003135.do Perl in a Nutshell, Second Edition By Stephen Spainhour, Ellen Siever, Nathan Patwardhan Second Edition June 2002 Pages: 760 Series: In a Nutshell ISBN 10: 0-596-00241-6 | ISBN 13: 9780596002411 http://shop.oreilly.com/product/9780596002411.do Learning Perl on Win32 Systems By Randal L. Schwartz, Erik Olson, Tom Christiansen August 1997 Pages: 306 ISBN 10: 1-56592-324-3 | ISBN 13: 9781565923249 http://shop.oreilly.com/product/9781565923249.do Learning Perl, Fifth Edition By Randal L. Schwartz, Tom Phoenix, brian d foy June 2008 Pages: 352 Print ISBN:978-0-596-52010-6 | ISBN 10: 0-596-52010-7 Ebook ISBN:978-0-596-10316-3 | ISBN 10: 0-596-10316-6 http://shop.oreilly.com/product/9780596520113.do Perl RESOURCE KIT UNIX EDITION Futato, Irving, Jepson, Patwardhan, Siever ISBN 10: 1-56592-370-7 http://shop.oreilly.com/product/9781565923706.do Understanding Japanese Information Processing By Ken Lunde O'Reilly Media September 1993 Pages: 470 ISBN: 978-1-56592-043-9 | ISBN 10:1-56592-043-0 http://shop.oreilly.com/product/9781565920439.do CJKV Information Processing Chinese, Japanese, Korean & Vietnamese Computing By Ken Lunde O'Reilly Media Print: January 1999 Ebook: June 2009 Pages: 1128 Print ISBN:978-1-56592-224-2 | ISBN 10:1-56592-224-7 Ebook ISBN:978-0-596-55969-4 | ISBN 10:0-596-55969-0 http://shop.oreilly.com/product/9781565922242.do CJKV Information Processing, 2nd Edition By Ken Lunde O'Reilly Media Print: December 2008 Ebook: June 2009 Pages: 912 Print ISBN: 978-0-596-51447-1 | ISBN 10:0-596-51447-6 Ebook ISBN: 978-0-596-15782-1 | ISBN 10:0-596-15782-7 http://shop.oreilly.com/product/9780596514471.do Mastering Regular Expressions, Second Edition By Jeffrey E. F. Friedl Second Edition July 2002 Pages: 484 ISBN 10: 0-596-00289-0 | ISBN 13: 9780596002893 http://shop.oreilly.com/product/9780596002893.do Mastering Regular Expressions, Third Edition By Jeffrey E. F. Friedl Third Edition August 2006 Pages: 542 ISBN 10: 0-596-52812-4 | ISBN 13:9780596528126 http://shop.oreilly.com/product/9780596528126.do Regular Expressions Cookbook By Jan Goyvaerts, Steven Levithan May 2009 Pages: 512 ISBN 10:0-596-52068-9 | ISBN 13: 978-0-596-52068-7 http://shop.oreilly.com/product/9780596520694.do PERL PUROGURAMINGU Larry Wall, Randal L.Schwartz, Yoshiyuki Kondo December 1997 ISBN 4-89052-384-7 http://www.context.co.jp/~cond/books/old-books.html JIS KANJI JITEN Kouji Shibano Pages: 1456 ISBN 4-542-20129-5 http://www.webstore.jsa.or.jp/lib/lib.asp?fn=/manual/mnl01_12.htm UNIX MAGAZINE 1993 Aug Pages: 172 T1008901080816 ZASSHI 08901-8 http://ascii.asciimw.jp/books/books/detail/978-4-7561-5008-0.shtml MacPerl Power and Ease By Vicki Brown, Chris Nandor April 1998 Pages: 350 ISBN 10: 1881957322 | ISBN 13: 978-1881957324 http://www.amazon.com/Macperl-Power-Ease-Vicki-Brown/dp/1881957322 Other Tools http://search.cpan.org/dist/Char/ http://search.cpan.org/dist/Char-Sjis/ http://search.cpan.org/dist/Modern-Open/ BackPAN http://backpan.perl.org/authors/id/I/IN/INA/
This software was made referring to software and the document that the following hackers or persons had made. I am thankful to all persons.
Larry Wall, Perl http://www.perl.org/ Kazumasa Utashiro, jcode.pl ftp://ftp.iij.ad.jp/pub/IIJ/dist/utashiro/perl/ http://web.archive.org/web/20090608090304/http://srekcah.org/jcode/ ftp://ftp.oreilly.co.jp/pcjp98/utashiro/ http://mail.pm.org/pipermail/tokyo-pm/2002-March/001319.html https://twitter.com/uta46/status/11578906320 mikeneko creator club, Private manual of jcode.pl http://mikeneko.creator.club.ne.jp/~lab/kcode/jcode.html gama, getcode.pl http://www2d.biglobe.ne.jp/~gama/cgi/jcode/jcode.htm Gappai, jcodeg.diff http://www.vector.co.jp/soft/win95/prog/se347514.html OHZAKI Hiroki, Perl memo http://www.din.or.jp/~ohzaki/perl.htm#JP_Code NAKATA Yoshinori, Ad hoc patch for reduce waring on h2z_euc http://white.niu.ne.jp/yapw/yapw.cgi/jcode.pl%A4%CE%A5%A8%A5%E9%A1%BC%CD%DE%C0%A9 Dan Kogai, Jcode module and Encode module http://search.cpan.org/dist/Jcode/ http://search.cpan.org/dist/Encode/ http://blog.livedoor.jp/dankogai/archives/50116398.html http://blog.livedoor.jp/dankogai/archives/51004472.html Donzoko CGI+--, Jcode like Encode Wrapper http://www.donzoko.net/cgi/jencode/ Yusuke Kawasaki, Encode561 module http://www.kawa.net/works/perl/i18n-emoji/i18n-emoji.html#Encode561 Tokyo-pm archive http://mail.pm.org/pipermail/tokyo-pm/ utf8_possible_story, Perl de Nihongo Aruaru http://aizen.likk.jp/slide/utf8_possible_story/
To install jacode, copy and paste the appropriate command in to your terminal.
cpanm
cpanm jacode
CPAN shell
perl -MCPAN -e shell install jacode
For more information on module installation, please visit the detailed CPAN module installation guide.