The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Lingua::JA::Moji - Handle many kinds of Japanese characters

SYNOPSIS

Convert various types of characters into one another.

    use Lingua::JA::Moji qw/kana2romaji romaji2kana/;
    use utf8;
    my $romaji = kana2romaji ('あいうえお');
    # $romaji is now 'aiueo'.
    my $kana = romaji2kana ($romaji);
    # $kana is now 'アイウエオ'.

EXPORT

This module does not export any functions except on request.

ENCODING

All the functions in this module assume that you are using Perl's Unicode encoding, and all input and output strings must be encoded using Perl's so-called "utf8".

FUNCTIONS

kana2romaji

    use utf8;
    my $romaji = kana2romaji ("うれしいこども");

    # $romaji = "uresiikodomo"

Convert kana to a romanized form.

An optional second argument, a hash reference, controls the style of conversion.

    use utf8;
    my $romaji = kana2romaji ("しんぶん", {style => "hepburn"});
    # $romaji = "shimbun"

The possible options are

style

The style of romanization. The default form of romanization is "Nippon-shiki". See http://www.sljfaq.org/afaq/nippon-shiki.html. The user can set the conversion style to "hepburn" or "passport" or "kunrei". See http://www.sljfaq.org/afaq/kana-roman.html.

use_m

If this is set to any "true" value, syllabic ns (ん) which come before "b" or "p" sounds, such as the first "n" in "shinbun" (しんぶん, newspaper) will be converted into "m" rather than "n".

ve_type

ve_type controls how long vowels are written. The default is to use circumflexes to represent long vowels. If you set "ve_type" => "macron", then it uses macrons (the Hepburn system). If you set "ve_type" => "passport", then it uses "oh" to write long "o" vowels. If you set "ve_type" => "none", then it does not use "h".

romaji2hiragana

    my $hiragana = romaji2hiragana ('babubo');

Convert romanized Japanese into hiragana. This takes the same options as romaji2kana. It also switches on the "wapuro" option which makes the use of long vowels with a kana rather than a chouon (long vowel marker).

romaji_styles

    my @styles = romaji_styles ();
    # Returns a true value
    romaji_styles ("hepburn");
    # Returns the undefined value
    romaji_styles ("frogs");

Given an argument, return whether it is a legitimate style of romanization.

Without an argument, return a list of possible styles, as an array of hash values, with each hash element containing "abbrev" as a short name and "full_name" for the full name of the style.

romaji2kana

     my $kana = romaji2kana ('yamaguti');
     # $kana = 'ヤマグチ';

Convert romanized Japanese to kana. The romanization is highly liberal and will attempt to convert any romanization it sees into kana.

     my $kana = romaji2kana ($romaji, {wapuro => 1});

Use an option wapuro => 1 to convert long vowels into the equivalent kana rather than chouon.

Convert romanized Japanese (romaji) into katakana. If you want to convert romanized Japanese into hiragana, use romaji2hiragana instead of this.

is_voiced

    if (is_voiced ('が')) {
         print "が is voiced.\n";
    }

Given a kana or romaji input, is_voiced returns a true value if the sound is a voiced sound like a, za, ga, etc. and the undefined value if not.

is_romaji

    # The following line returns "undef"
    is_romaji ("abcdefg");
    # The following line returns a defined value
    is_romaji ("atarimae");

Detect whether a string of alphabetical characters, which may also include characters with macrons or circumflexes, "looks like" romanized Japanese. If the test is successful, returns the romaji in a canonical form.

This functions by converting the string to kana and seeing if it converts cleanly or not.

hira2kata

    my $katakana = hira2kata ($hiragana);

hira2kata converts hiragana into katakana. If the input is a list, it converts each element of the list, and if required, returns a list of the converted inputs, otherwise it returns a concatenation of the strings.

    my @katakana = hira2kata (@hiragana);

This does not convert chouon signs.

kata2hira

     my $hiragana = kata2hira ('カキクケコ');
     # $hiragana = 'かきくけこ';

kata2hira converts full-width katakana into hiragana. If the input is a list, it converts each element of the list, and if required, returns a list of the converted inputs, otherwise it returns a concatenation of the strings.

    my @hiragana = hira2kata (@katakana);

This function does not convert chouon signs into long vowels. It also does not convert half-width katakana into hiragana.

kana2hw

     my $half_width = kana2hw ('あいウカキぎょう。');
     # $half_width = 'アイウカキギョウ。'

kana2hw converts hiragana, katakana, and fullwidth Japanese punctuation to halfwidth katakana and halfwidth punctuation. Its function is similar to the Emacs command japanese-hankaku-region. For the opposite function, see hw2katakana.

hw2katakana

     my $full_width = hw2katakana ('アイウカキギョウ。');
     # $full_width = 'アイウカキギョウ。';

hw2katakana converts halfwidth katakana and Japanese punctuation to fullwidth katakana and punctuation. Its function is similar to the Emacs command japanese-zenkaku-region. For the opposite function, see kana2hw.

InHankakuKatakana

    use Lingua::JA::Moji qw/InHankakuKatakana/;
    use utf8;
    if ('ア' =~ /\p{InHankakuKatakana}/) {
        print "ア is half-width katakana\n";
    }

InHankakuKatakana is a character class for use in regular expressions with \p which can validate halfwidth katakana.

wide2ascii

     my $ascii = wide2ascii ('abCE019');
     # $ascii = 'abCE019'

Convert the "wide ASCII" used in Japan (fullwidth ASCII, 全角英数字) into usual ASCII symbols (半角英数字).

ascii2wide

Convert usual ASCII symbols (半角英数字) into the "wide ASCII" used in Japan (fullwidth ASCII, 全角英数字).

InWideAscii

    use Lingua::JA::Moji qw/InWideAscii/;
    use utf8;
    if ('A' =~ /\p{InWideAscii}/) {
        print "A is wide ascii\n";
    }

This is a character class for use with \p which matches a "wide ascii" (全角英数字).

kana2morse

Convert Japanese kana into Morse code

is_kana

Returns a true value if its argument is a string of kana, or an undefined value if not.

is_hiragana

Returns a true value if its argument is a string of kana, or an undefined value if not.

kana2katakana

Convert either katakana or hiragana to katakana.

kana2braille

Converts kana into the equivalent Japanese braille (tenji) forms.

braille2kana

Converts Japanese braille (tenji) into the equivalent katakana.

kana2circled

kana2circled converts kana into the "circled katakana" of Unicode.

circled2kana

circled2kana converts the "circled katakana" of Unicode into the usual katakana.

normalize_romaji

normalize_romaji converts romanized Japanese to a canonical form, which is based on the Nippon-shiki romanization, but without representing long vowels using a circumflex.

AUTHOR

Ben Bullock, <bkb@cpan.org>

SUPPORT

Mailing list

I have set up a mailing list for this module and Convert::Moji at http://groups.google.com/group/perl-moji. If you have any questions about either of these modules, please ask on the mailing list rather than sending me email, because I would prefer that a record of the conversation can be kept for the future reference of other users.

Examples

For examples of this module in use, see my website http://www.lemoda.net/lingua-ja-moji/index.html. This page links to examples which I've set up on the web specifically to show this module in action.

Bugs

Please send bug reports to the Perl bug tracker at rt.cpan.org, or send them to the mailing list.

There are some known bugs or issues with romaji to kana conversion and vice-versa. I'm still working on these.

STATUS

This module is "alpha" (that is a computerese euphemism for "the module is badly-formed and unfinished") and the external interface is liable to change drastically in the future. If you have a request, please speak up.

Please also note that some of this documentation is not finished yet, some of the functions documented here don't exist yet.

SEE ALSO

There are some other useful Perl modules already on CPAN as follows.

Japanese kana/romanization

Data::Validate::Japanese

This is where I got several of the ideas for this module from. It contains validators for kanji and kana.

Lingua::JA::Kana

This is where I got several of the ideas for this module from. It contains convertors for hiragana, katakana (fullwidth only), and romaji. The romaji conversion is less complete than this module but more compact and probably much faster, if you need high speed romanization.

Lingua::JA::Romanize::Japanese

Romanization of Japanese. The module also includes romanization of kanji via the kakasi kanji to romaji convertor, and other functions.

Lingua::JA::Romaji::Valid

Validate romanized Japanese.

Lingua::JA::Hepburn::Passport

Other

ACKNOWLEDGEMENTS

Thanks to Naoki Tomita for various assitances (see http://groups.google.com/group/perl-moji/browse_thread/thread/10a42c35f7c22ebc).

COPYRIGHT & LICENSE

Copyright 2008-2010 Ben Bullock, all rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.