Lingua::JA::Moji - Handle many kinds of Japanese characters
Convert various types of Japanese characters into one another.
use Lingua::JA::Moji qw/kana2romaji romaji2kana/; use utf8; my $romaji = kana2romaji ('あいうえお'); # $romaji is now 'aiueo'. my $kana = romaji2kana ($romaji); # $kana is now 'アイウエオ'.
use Lingua::JA::Moji 'kana2romaji'; $romaji = kana2romaji ("うれしいこども"); # Now $romaji = 'uresîkodomo'
Convert kana to a romanized form.
An optional second argument, a hash reference, controls the style of conversion.
use utf8; $romaji = kana2romaji ("しんぶん", {style => "hepburn"}); # $romaji = "shimbun"
The possible options are
The style of romanization. The default form of romanization is "Nippon-shiki". See http://www.sljfaq.org/afaq/nippon-shiki.html. The user can set the conversion style to "hepburn" or "passport" or "kunrei". See http://www.sljfaq.org/afaq/kana-roman.html.
If this is set to any "true" value, syllabic ns (ん) which come before "b" or "p" sounds, such as the first "n" in "shinbun" (しんぶん, newspaper) will be converted into "m" rather than "n".
ve_type controls how long vowels are written. The default is to use circumflexes to represent long vowels. If you set "ve_type" => "macron", then it uses macrons (the Hepburn system). If you set "ve_type" => "passport", then it uses "oh" to write long "o" vowels. If you set "ve_type" => "none", then it does not use "h".
ve_type
"ve_type" => "passport"
"ve_type" => "none"
use Lingua::JA::Moji 'romaji2hiragana'; $hiragana = romaji2hiragana ('babubo'); # Now $hiragana = 'ばぶぼ'
Convert romanized Japanese into hiragana. This takes the same options as romaji2kana. It also switches on the "wapuro" option which makes the use of long vowels with a kana rather than a chouon (long vowel marker).
use Lingua::JA::Moji 'romaji_styles'; my @styles = romaji_styles (); # Returns a true value romaji_styles ("hepburn"); # Returns the undefined value romaji_styles ("frogs");
Given an argument, return whether it is a legitimate style of romanization.
Without an argument, return a list of possible styles, as an array of hash values, with each hash element containing "abbrev" as a short name and "full_name" for the full name of the style.
use Lingua::JA::Moji 'romaji2kana'; $kana = romaji2kana ('yamaguti'); # Now $kana = 'ヤマグチ'
Convert romanized Japanese to kana. The romanization is highly liberal and will attempt to convert any romanization it sees into kana.
$kana = romaji2kana ($romaji, {wapuro => 1});
Use an option wapuro => 1 to convert long vowels into the equivalent kana rather than chouon.
wapuro => 1
Convert romanized Japanese (romaji) into katakana. If you want to convert romanized Japanese into hiragana, use romaji2hiragana instead of this.
use Lingua::JA::Moji 'is_voiced'; if (is_voiced ('が')) { print "が is voiced.\n"; }
Given a kana or romaji input, is_voiced returns a true value if the sound is a voiced sound like a, za, ga, etc. and the undefined value if not.
is_voiced
use Lingua::JA::Moji 'is_romaji'; # The following line returns "undef" is_romaji ("abcdefg"); # The following line returns a defined value is_romaji ("atarimae");
Detect whether a string of alphabetical characters, which may also include characters with macrons or circumflexes, "looks like" romanized Japanese. If the test is successful, returns the romaji in a canonical form.
This functions by converting the string to kana and seeing if it converts cleanly or not.
use Lingua::JA::Moji 'normalize_romaji'; $normalized = normalize_romaji ('tsumuji');
normalize_romaji converts romanized Japanese to a canonical form, which is based on the Nippon-shiki romanization, but without representing long vowels using a circumflex. In the canonical form, sokuon (っ) characters are converted into the string "xtu".
normalize_romaji
If there is kana in the input string, this will also be converted to romaji.
use Lingua::JA::Moji 'hira2kata'; $katakana = hira2kata ($hiragana);
hira2kata converts hiragana into katakana. If the input is a list, it converts each element of the list, and if required, returns a list of the converted inputs, otherwise it returns a concatenation of the strings.
hira2kata
my @katakana = hira2kata (@hiragana);
This does not convert chouon signs.
use Lingua::JA::Moji 'kata2hira'; $hiragana = kata2hira ('カキクケコ'); # Now $hiragana = 'かきくけこ'
kata2hira converts full-width katakana into hiragana. If the input is a list, it converts each element of the list, and if required, returns a list of the converted inputs, otherwise it returns a concatenation of the strings.
kata2hira
my @hiragana = hira2kata (@katakana);
This function does not convert chouon signs into long vowels. It also does not convert half-width katakana into hiragana.
use Lingua::JA::Moji 'InHankakuKatakana'; use utf8; if ('ア' =~ /\p{InHankakuKatakana}/) { print "ア is half-width katakana\n"; }
InHankakuKatakana is a character class for use in regular expressions with \p which can validate halfwidth katakana.
InHankakuKatakana
\p
use Lingua::JA::Moji 'kana2hw'; $half_width = kana2hw ('あいウカキぎょう。'); # Now $half_width = 'アイウカキギョウ。'
kana2hw converts hiragana, katakana, and fullwidth Japanese punctuation to halfwidth katakana and halfwidth punctuation. Its function is similar to the Emacs command japanese-hankaku-region. For the opposite function, see hw2katakana.
kana2hw
japanese-hankaku-region
use Lingua::JA::Moji 'hw2katakana'; $full_width = hw2katakana ('アイウカキギョウ。'); # Now $full_width = 'アイウカキギョウ。'
hw2katakana converts halfwidth katakana and Japanese punctuation to fullwidth katakana and punctuation. Its function is similar to the Emacs command japanese-zenkaku-region. For the opposite function, see kana2hw.
hw2katakana
japanese-zenkaku-region
use Lingua::JA::Moji 'InWideAscii'; use utf8; if ('A' =~ /\p{InWideAscii}/) { print "A is wide ascii\n"; }
This is a character class for use with \p which matches a "wide ascii" (全角英数字).
use Lingua::JA::Moji 'wide2ascii'; $ascii = wide2ascii ('abCE019'); # Now $ascii = 'abCE019'
Convert the "wide ASCII" used in Japan (fullwidth ASCII, 全角英数字) into usual ASCII symbols (半角英数字).
use Lingua::JA::Moji 'ascii2wide'; $wide = ascii2wide ('abCE019'); # Now $wide = 'abCE019'
Convert usual ASCII symbols (半角英数字) into the "wide ASCII" used in Japan (fullwidth ASCII, 全角英数字).
use Lingua::JA::Moji 'is_kana';
This function returns a true value if its argument is a string of kana, or an undefined value if not.
use Lingua::JA::Moji 'is_hiragana';
use Lingua::JA::Moji 'kana2katakana';
Convert any of katakana, halfwidth katakana, circled katakana and hiragana to full width katakana.
use Lingua::JA::Moji 'kana2morse';
Convert Japanese kana into Morse code
use Lingua::JA::Moji 'kana2braille';
Converts kana into the equivalent Japanese braille (tenji) forms.
use Lingua::JA::Moji 'braille2kana';
Converts Japanese braille (tenji) into the equivalent katakana.
use Lingua::JA::Moji 'kana2circled'; $circled = kana2circled ('あいうえお'); # $circled = '㋐㋑㋒㋓㋔'; # Now $circled = '㋐㋑㋒㋓㋔'
This function converts kana into the "circled katakana" of Unicode, which have code points from 32D0 to 32FE. See also "circled2kana".
use Lingua::JA::Moji 'circled2kana'; $kana = circled2kana ('㋐㋑㋒㋓㋔'); # Now $kana = 'アイウエオ'
This function converts the "circled katakana" of Unicode into full-width katakana. See also "kana2circled".
use Lingua::JA::Moji 'new2old_kanji'; $old = new2old_kanji ('三国 連太郎'); # Now $old = '三國 連太郎'
Convert new-style (post-1949) kanji (Chinese characters) into old-style (pre-1949) kanji.
use Lingua::JA::Moji 'old2new_kanji'; $new = old2new_kanji ('櫻井'); # Now $new = '桜井'
Convert old-style (pre-1949) kanji (Chinese characters) into new-style (post-1949) kanji.
There is a mailing list for this module and Convert::Moji at http://groups.google.com/group/perl-moji.
For examples of this module in use, see http://www.lemoda.net/lingua-ja-moji/index.html.
There are some bugs with romaji to kana conversion and vice-versa.
Other Perl modules on CPAN include
This is where I got several of the ideas for this module from. It contains validators for kanji and kana.
This is where several of the ideas for this module came from. It contains convertors for hiragana, katakana (fullwidth only), and romaji. The romaji conversion is less complete than this module but more compact and probably much faster.
Romanization of Japanese. The module also includes romanization of kanji via the kakasi kanji to romaji convertor, and other functions.
Validate romanized Japanese.
This module exports its functions only on request. To export all the functions in the module,
use Lingua::JA::Moji ':all';
All the functions in this module assume the use of Unicode encoding. All input and output strings must be encoded using UTF-8.
Thanks to Naoki Tomita for various assitances (see http://groups.google.com/group/perl-moji/browse_thread/thread/10a42c35f7c22ebc).
Ben Bullock, <bkb@cpan.org>
<bkb@cpan.org>
Copyright 2008-2011 Ben Bullock, all rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
To install Lingua::JA::Moji, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Lingua::JA::Moji
CPAN shell
perl -MCPAN -e shell install Lingua::JA::Moji
For more information on module installation, please visit the detailed CPAN module installation guide.