NAME
jacode.pl - Perl library for Japanese character code conversion
SYNOPSIS
require 'jacode.pl';
# note: file name is 'jacode.pl', but package name is 'jcode'
# Perl4 interface:
&jcode'getcode(*line)
&jcode'convert(*line, $ocode [, $icode [, $option]])
&jcode'xxx2yyy(*line [, $option])
&jcode'to($ocode, $line [, $icode [, $option]])
&jcode'jis($line [, $icode [, $option]])
&jcode'euc($line [, $icode [, $option]])
&jcode'sjis($line [, $icode [, $option]])
&jcode'utf8($line [, $icode [, $option]])
&jcode'jis_inout($in, $out)
&jcode'get_inout($string)
&jcode'cache()
&jcode'nocache()
&jcode'flushcache()
&jcode'flush()
&jcode'h2z_xxx(*line)
&jcode'z2h_xxx(*line)
&jcode'tr(*line, $from, $to [, $option])
&jcode'trans($line, $from, $to [, $option])
&jcode'init()
$jcode'convf{'xxx', 'yyy'}
$jcode'z2hf{'xxx'}
$jcode'h2zf{'xxx'}
# Perl5 interface:
jcode::getcode(\$line)
jcode::convert(\$line, $ocode [, $icode [, $option]])
jcode::xxx2yyy(\$line [, $option])
jcode::to($ocode, $line [, $icode [, $option]])
jcode::jis($line [, $icode [, $option]])
jcode::euc($line [, $icode [, $option]])
jcode::sjis($line [, $icode [, $option]])
jcode::utf8($line [, $icode [, $option]])
jcode::jis_inout($in, $out)
jcode::get_inout($string)
jcode::cache()
jcode::nocache()
jcode::flushcache()
jcode::flush()
jcode::h2z_xxx(\$line)
jcode::z2h_xxx(\$line)
jcode::tr(\$line, $from, $to [, $option])
jcode::trans($line, $from, $to [, $option])
jcode::init()
&{$jcode::convf{'xxx', 'yyy'}}(\$line)
&{$jcode::z2hf{'xxx'}}(\$line)
&{$jcode::h2zf{'xxx'}}(\$line)
ABSTRACT
This software has upper compatibility to jcode.pl. 'Ja' is a meaning of 'Japanese' in ISO 639-1 code and is unrelated to 'JA Group Organization'.
The code conversion from 'sjis' to 'utf8' is done by using following table.
http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT
From 'utf8' to 'sjis' is done by using the CP932.TXT and following table.
PRB: Conversion Problem Between Shift-JIS and Unicode
http://support.microsoft.com/kb/170559/en-us
What's this software good for ...
jcode.pl upper compatible
Perl4 script
Acts as a wrapper to Encode::from_to
Support HALFWIDTH KATAKANA
Support UTF-8
Hidden UTF8 flag
No object-oriented programming
Possible to re-use past code and how to
DEPENDENCIES
This software requires perl 4.036 or later.
PERL4 INTERFACE
- &jcode'getcode(*line)
-
Return 'jis', 'sjis', 'euc', 'utf8' or undef according to Japanese character code in $line. Return 'binary' if the data has non-character code. When evaluated in array context, it returns a list contains two items. First value is the number of characters which matched to the expected code, and second value is the code name. It is useful if and only if the number is not 0 and the code is undef; that case means it couldn't tell 'euc' or 'sjis' because the evaluation score was exactly same. This interface is too tricky, though. Code detection between euc and sjis is very difficult or sometimes impossible or even lead to wrong result when it includes JIS X0201 KANA characters.
- &jcode'convert(*line, $ocode [, $icode [, $option]])
-
Convert the contents of $line to the specified Japanese code given in the second argument $ocode. $ocode can be any of "jis", "sjis", "euc" or "utf8", or use "noconv" when you don't want the code conversion. Input code is recognized automatically from the line itself when $icode is not supplied. $icode also can be specified, but xxx2yyy routine is more efficient when both codes are known. It returns the code of input string in scalar context, and a list of pointer of convert subroutine and the input code in array context. Japanese character code JIS X0201, X0208, X0212 and ASCII code are supported. JIS X0212 characters can not be represented in sjis or utf8 and they will be replased by "geta" character when converted to sjis. JIS X0213 characters can not be represented in all. For perl is 5.8.1 or later, &jcode'convert acts as a wrapper to Encode::from_to. When $ocode or $icode is neither "jis", "sjis", "euc" nor "utf8", and Encode module can be used, Encode::from_to( $line, $icode, $ocode ) is executed instead of &jcode'convert(*line, $ocode, $icode, $option). In this case, there is no effective return value of pointer of convert subroutine in array context. See next paragraph for $option parameter.
- &jcode'xxx2yyy(*line [, $option])
-
Convert the Japanese code from xxx to yyy. String xxx and yyy are any convination from "jis", "euc", "sjis" or "utf8". They return *approximate* number of converted bytes. So return value 0 means the line was not converted at all. Optional parameter $option is used to specify optional conversion method. String "z" is for JIS X0201 KANA to JIS X0208 KANA, and "h" is for reverse.
- $jcode'convf{'xxx', 'yyy'}
-
The value of this associative array is pointer to the subroutine jcode'xxx2yyy().
- &jcode'to($ocode, $line [, $icode [, $option]])
- &jcode'jis($line [, $icode [, $option]])
- &jcode'euc($line [, $icode [, $option]])
- &jcode'sjis($line [, $icode [, $option]])
- &jcode'utf8($line [, $icode [, $option]])
-
These functions are prepared for easy use of call/return-by-value interface. You can use these funcitons in s///e operation or any other place for convenience.
- &jcode'jis_inout($in, $out)
-
Set or inquire JIS start and end sequences. Default is "ESC-$-B" and "ESC-(-B". If you supplied only one character, "ESC-$" or "ESC-(" is prepended for each character respectively. Acutually "ESC-(-B" is not a sequence to end JIS code but a sequence to start ASCII code set. So `in' and `out' are somewhat misleading.
- &jcode'get_inout($string)
-
Get JIS start and end sequences from $string.
- &jcode'cache()
- &jcode'nocache()
- &jcode'flushcache()
- &jcode'flush()
-
Usually, converted character is cached in memory to avoid same calculations have to be done many times. To disable this caching, call &jcode'nocache(). It can be revived by &jcode'cache() and cache is flushed by calling &jcode'flushcache(). &cache() and &nocache() functions return previous caching state. &jcode'flush() is an alias of &jcode'flushcache() to save an old document.
- &jcode'h2z_xxx(*line)
-
JIS X0201 KANA (so-called Hankaku-KANA) to JIS X0208 KANA (Zenkaku-KANA) code conversion routine. String xxx is any of "jis", "sjis", "euc" and "utf8". From the difficulty of recognizing code set from 1-byte KATAKANA string, automatic code recognition is not supported.
- &jcode'z2h_xxx(*line)
-
JIS X0208 to JIS X0201 KANA code conversion routine. String xxx is any of "jis", "sjis", "euc" and "utf8".
- $jcode'z2hf{'xxx'}
- $jcode'h2zf{'xxx'}
-
These are pointer to the corresponding function just as $jcode'convf.
- &jcode'tr(*line, $from, $to [, $option])
-
&jcode'tr emulates tr operator for 2 byte code. Only 'd' is interpreted as an option. Range operator like `A-Z' for 2 byte code is partially supported. Code must be JIS or EUC, and first byte have to be same on first and last character. CAUTION: Handling range operator is a kind of trick and it is not perfect. So if you need to transfer `-' character, please be sure to put it at the beginning or the end of $from and $to strings.
- &jcode'trans($line, $from, $to [, $option])
-
Same as &jcode'tr but accept string and return string after translation.
- &jcode'init()
-
Initialize the variables used in this package. You don't have to call this when using jocde.pl by `do' or `require' interface. Call it first if you embedded the jacode.pl at the end of your script.
PERL5 INTERFACE
Current jacode.pl is written in Perl 4 but it is possible to use from Perl 5 using `references'. Fully perl5 capable version is future issue.
Since lexical variable is not a subject of typeglob, *string style call doesn't work if the variable is declared as `my'. Same thing happens to special variable $_ if the perl is compiled to use thread capability. So using reference is generally recommented to avoid the mysterious error.
- jcode::getcode(\$line)
- jcode::convert(\$line, $ocode [, $icode [, $option]])
- jcode::xxx2yyy(\$line [, $option])
- &{$jcode::convf{'xxx', 'yyy'}}(\$line)
- jcode::to($ocode, $line [, $icode [, $option]])
- jcode::jis($line [, $icode [, $option]])
- jcode::euc($line [, $icode [, $option]])
- jcode::sjis($line [, $icode [, $option]])
- jcode::utf8($line [, $icode [, $option]])
- jcode::jis_inout($in, $out)
- jcode::get_inout($string)
- jcode::cache()
- jcode::nocache()
- jcode::flushcache()
- jcode::flush()
- jcode::h2z_xxx(\$line)
- jcode::z2h_xxx(\$line)
- &{$jcode::z2hf{'xxx'}}(\$line)
- &{$jcode::h2zf{'xxx'}}(\$line)
- jcode::tr(\$line, $from, $to [, $option])
- jcode::trans($line, $from, $to [, $option])
- jcode::init()
SAMPLES
Convert any Kanji code to JIS and print each line with code name.
#require 'jcode.pl';
require 'jacode.pl';
while (defined($s = <>)) {
$code = &jcode'convert(*s, 'jis');
print $code, "\t", $s;
}
Convert all lines to JIS according to the first recognized line.
#require 'jcode.pl';
require 'jacode.pl';
while (defined($s = <>)) {
print, next unless $s =~ /[\033\200-\377]/;
(*f, $icode) = &jcode'convert(*s, 'jis');
print;
defined(&f) || next;
while (<>) { &f(*s); print; }
last;
}
The safest way of JIS conversion.
#require 'jcode.pl';
require 'jacode.pl';
while (defined($s = <>)) {
($matched, $icode) = &jcode'getcode(*s);
if (@buf == 0 && $matched == 0) {
print $s;
next;
}
push(@buf, $s);
next unless $icode;
while (defined($s = shift(@buf))) {
&jcode'convert(*s, 'jis', $icode);
print $s;
}
while (defined($s = <>)) {
&jcode'convert(*s, 'jis', $icode);
print $s;
}
last;
}
print @buf if @buf;
Convert SJIS to UTF-8 and print each line by perl 4.036 or later.
#retire 'jcode.pl';
require 'jacode.pl';
while (defined($s = <>)) {
&jcode'convert(*s, 'utf8', 'sjis');
print $s;
}
Convert SJIS to UTF16-BE and print each line by perl 5.8.1 or later.
require 'jacode.pl';
use 5.8.1;
while (defined($s = <>)) {
jcode::convert(\$s, 'UTF16-BE', 'sjis');
print $s;
}
Convert SJIS to MIME-Header-ISO_2022_JP and print each line by perl 5.8.1 or later.
require 'jacode.pl';
use 5.8.1;
while (defined($s = <>)) {
jcode::convert(\$s, 'MIME-Header-ISO_2022_JP', 'sjis');
print $s;
}
BUGS AND LIMITATIONS
You must use -Llatin switch if you use on the JPerl.
AUTHOR
This project was originated by INABA Hitoshi <ina@cpan.org>.
LICENSE AND COPYRIGHT
This software is free software;
Use and redistribution for ANY PURPOSE are granted as long as all copyright notices are retained. Redistribution with modification is allowed provided that you make your modified version obviously distinguishable from the original one. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES ARE DISCLAIMED.
This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
SEE ALSO
PERL PUROGURAMINGU
Larry Wall, Randal L.Schwartz, Yoshiyuki Kondo
December 1997
ISBN 4-89052-384-7
http://www.context.co.jp/~cond/books/old-books.html
Understanding Japanese Information Processing
By Ken Lunde
January 1900
Pages: 470
ISBN 10: 1-56592-043-0 | ISBN 13: 9781565920439
http://oreilly.com/catalog/9781565920439/
CJKV Information Processing
Chinese, Japanese, Korean & Vietnamese Computing
By Ken Lunde
First Edition January 1999
Pages: 1128
ISBN 10: 1-56592-224-7 | ISBN 13:9781565922242
http://www.oreilly.com/catalog/cjkvinfo/index.html
ISBN 4-87311-108-0
http://www.oreilly.co.jp/books/4873111080/
JIS KANJI JITEN
Kouji Shibano
Pages: 1456
ISBN 4-542-20129-5
http://www.webstore.jsa.or.jp/lib/lib.asp?fn=/manual/mnl01_12.htm
Unicode NI YORU JIS X 0213 JISSOU NYUMON
Kenzaburo Tamaru
Pages: 200
ISBN 978-4-89100-608-2
http://ec.nikkeibp.co.jp/item/books/A04500.html
ACKNOWLEDGEMENTS
This software was made referring to software and the document that the following hackers or persons had made. I am thankful to all persons.
Larry Wall, Perl
http://www.perl.org/
Kazumasa Utashiro, jcode.pl
ftp://ftp.iij.ad.jp/pub/IIJ/dist/utashiro/perl/
http://mail.pm.org/pipermail/tokyo-pm/2002-March/001319.html
gama, getcode.pl
http://www2d.biglobe.ne.jp/~gama/cgi/jcode/jcode.htm
Gappai, jcodeg.diff
http://www.vector.co.jp/soft/win95/prog/se347514.html
OHZAKI Hiroki, Perl memo
http://www.din.or.jp/~ohzaki/perl.htm#JP_Code
NAKATA Yoshinori, Ad hoc patch for reduce waring on h2z_euc
http://white.niu.ne.jp/yapw/yapw.cgi/jcode.pl%A4%CE%A5%A8%A5%E9%A1%BC%CD%DE%C0%A9
Dan Kogai, Jcode module and Encode module
http://search.cpan.org/dist/Jcode/
http://search.cpan.org/dist/Encode/
http://blog.livedoor.jp/dankogai/archives/50116398.html
http://blog.livedoor.jp/dankogai/archives/51004472.html
Donzoko CGI+--, Jcode like Encode Wrapper
http://www.donzoko.net/cgi/jencode/
Yusuke Kawasaki, Encode561 module
http://www.kawa.net/works/perl/i18n-emoji/i18n-emoji.html#Encode561
Tokyo-pm archive
http://mail.pm.org/pipermail/tokyo-pm/