Lingua::JA::Moji - 総合日本文字変換「文字ュール」
日本の文字の総合変換
use Lingua::JA::Moji qw/kana2romaji romaji2kana/; use utf8; my $romaji = kana2romaji ('あいうえお'); # $romaji is now 'aiueo'. my $kana = romaji2kana ($romaji); # $kana is now 'アイウエオ'.
This module provides methods to convert different written forms of Japanese into one another.
All the functions in this module assume the use of Unicode encoding. All input and output strings must be encoded using UTF-8.
These functions convert Japanese letters to and from romanized forms.
use Lingua::JA::Moji 'kana2romaji'; $romaji = kana2romaji ("うれしいこども"); # Now $romaji = 'uresîkodomo'
仮名をローマ字に変換。
オプションは関数の2番目のハシュリファレンスで入ります。
use utf8; $romaji = kana2romaji ("しんぶん", {style => "hepburn"}); # $romaji = "shimbun"
可能なオプションは
ローマ字の種類。
ディフォルトは日本式(「つづり」が「tuduri」, 「少女」が「syôzyo」)。
パスポート式(「伊藤」が「itoh」)
訓令式(少学校4年で習うローマ字)
ヘボン式(「つづり」が「tsuzuri」, 「少女」が「shōjo」)。
真なら「しんぶん」が「shimbun」
長い母音はどの様に表現する。
曲折アクセントを使う。
マクロンを使う。
「アー」、「イー」、「ウー」、「エー」が「a」, 「i」, 「u」, 「e」になり、「オー」が「oh」になる。
「アー」、「イー」、「ウー」、「エー」ガ「a」, 「i」, 「u」, 「e」, 「o」になる。
「アー」、「イー」、「ウー」、「エー」ガ「a-」, 「i-」, 「u-」, 「e-」, 「o-」になる。「おう」が「ou」など、仮名の長音を仮名で代表するよう、ロー マ字入力のようなことです。
ワープロローマ字。長音符は使わない。「少女」が「shoujo」など。
use Lingua::JA::Moji 'romaji2kana'; $kana = romaji2kana ('yamaguti'); # Now $kana = 'ヤマグチ'
Convert romanized Japanese to kana. The romanization is highly liberal and will attempt to convert any romanization it sees into kana. To convert romanized Japanese into hiragana, use "romaji2hiragana".
The second argument to the function contains options in the form of a hash reference,
$kana = romaji2kana ($romaji, {wapuro => 1});
Use an option wapuro => 1 to convert long vowels into the equivalent kana rather than chouon.
wapuro => 1
use Lingua::JA::Moji 'romaji2hiragana'; $hiragana = romaji2hiragana ('babubo'); # Now $hiragana = 'ばぶぼ'
Convert romanized Japanese into hiragana. This takes the same options as "romaji2kana". It also switches on the "wapuro" option which makes the use of long vowels with a kana rather than a chouon (long vowel marker).
use Lingua::JA::Moji 'romaji_styles'; my @styles = romaji_styles (); # Returns a true value romaji_styles ("hepburn"); # Returns the undefined value romaji_styles ("frogs");
Given an argument, return whether it is a legitimate style of romanization.
Without an argument, return a list of possible styles, as an array of hash values, with each hash element containing "abbrev" as a short name and "full_name" for the full name of the style.
use Lingua::JA::Moji 'is_voiced'; if (is_voiced ('が')) { print "が is voiced.\n"; }
仮名かローマ字は濁音、半濁音がついていれば、真、ついていなければ偽です。
use Lingua::JA::Moji 'is_romaji'; # The following line returns "undef" is_romaji ("abcdefg"); # The following line returns a defined value is_romaji ("atarimae");
アルファベットの列はローマ字に見えるなら真、見えないなら偽。
use Lingua::JA::Moji 'normalize_romaji'; $normalized = normalize_romaji ('tsumuji');
normalize_romaji converts romanized Japanese to a canonical form, which is based on the Nippon-shiki romanization, but without representing long vowels using a circumflex. In the canonical form, sokuon (っ) characters are converted into the string "xtu".
normalize_romaji
If there is kana in the input string, this will also be converted to romaji.
normalize_romaji is for comparing two Japanese words which may be represented in different ways, for example in different romanization systems, to see if they refer to the same word despite the difference in writing. It does not provide a standardized or officially-sanctioned form of romanization.
use Lingua::JA::Moji 'hira2kata'; $katakana = hira2kata ('ひらがな'); # Now $katakana = 'ヒラガナ'
平仮名をかたかなに変換します。長音符は変換しません。
use Lingua::JA::Moji 'kata2hira'; $hiragana = kata2hira ('カキクケコ'); # Now $hiragana = 'かきくけこ'
かたかなを平仮名に変換します。長音符は変換しません。
use Lingua::JA::Moji 'InHankakuKatakana'; use utf8; if ('ア' =~ /\p{InHankakuKatakana}/) { print "ア is half-width katakana\n"; }
InHankakuKatakana is a character class for use in regular expressions with \p which can validate halfwidth katakana.
InHankakuKatakana
\p
use Lingua::JA::Moji 'kana2hw'; $half_width = kana2hw ('あいウカキぎょう。'); # Now $half_width = 'アイウカキギョウ。'
あらゆる仮名文字を半角カタカナに変換する。
use Lingua::JA::Moji 'hw2katakana'; $full_width = hw2katakana ('アイウカキギョウ。'); # Now $full_width = 'アイウカキギョウ。'
半角カタカナを全角カタカナに変換する。
use Lingua::JA::Moji 'is_kana';
入力が仮名のみの場合、真、入力が仮名なでない文字を含む場合、偽(undef)。
use Lingua::JA::Moji 'is_hiragana';
入力が平仮名のみの場合、真、入力が平仮名なでない文字を含む場合、偽(undef)。
use Lingua::JA::Moji 'kana2katakana';
Convert any of katakana, halfwidth katakana, circled katakana and hiragana to full width katakana.
日本のホームページなら、「半角英数字」にこだわります。下記の関数をお使 いの場合、そんな必要性はありません。
use Lingua::JA::Moji 'InWideAscii'; use utf8; if ('A' =~ /\p{InWideAscii}/) { print "A is wide ascii\n"; }
正規表現に使う全角英数字にマッチする。
use Lingua::JA::Moji 'wide2ascii'; $ascii = wide2ascii ('abCE019'); # Now $ascii = 'abCE019'
全角英数字を半角英数字(ASCII)に変換する。
use Lingua::JA::Moji 'ascii2wide'; $wide = ascii2wide ('abCE019'); # Now $wide = 'abCE019'
半角英数字(ASCII)を全角英数字に変換する。
use Lingua::JA::Moji 'kana2morse'; $morse = kana2morse ('しょっちゅう'); # Now $morse = '--.-. -- .--. ..-. -..-- ..-'
Convert Japanese kana into Morse code. Note that Japanese morse code does not have any way of representing small kana characters, so converting to and then from morse code will result in しょっちゅう becoming シヨツチユウ.
use Lingua::JA::Moji 'morse2kana'; $kana = morse2kana ('--.-. -- .--. ..-. -..-- ..-'); # Now $kana = 'シヨツチユウ'
Convert Japanese Morse code into kana. Each Morse code element must be separated by whitespace from the next one.
This has not been extensively tested.
use Lingua::JA::Moji 'kana2braille';
Converts kana into the equivalent Japanese braille (tenji) forms.
きちんとしたテストがありません。日本語を点字に変換することはわたちがきが必要ですがこの関数はそれをしません。
use Lingua::JA::Moji 'braille2kana';
Converts Japanese braille (tenji) into the equivalent katakana.
use Lingua::JA::Moji 'kana2circled'; $circled = kana2circled ('あいうえお'); # Now $circled = '㋐㋑㋒㋓㋔'
仮名を丸付けかたかなに変換します。丸付け「ン」がないので、ンはそのままとなります。 丸付け片假名はユーニコード32D0〜32FEにあります。
use Lingua::JA::Moji 'circled2kana'; $kana = circled2kana ('㋐㋑㋒㋓㋔'); # Now $kana = 'アイウエオ'
This function converts the "circled katakana" of Unicode into full-width katakana. See also "kana2circled".
use Lingua::JA::Moji 'new2old_kanji'; $old = new2old_kanji ('三国 連太郎'); # Now $old = '三國 連太郎'
親字体を旧字体に変換する
The list of characters in this convertor may not contain every pair of old/new kanji.
It will not correctly convert 弁 since this has three different equivalents in the old system.
use Lingua::JA::Moji 'old2new_kanji'; $new = old2new_kanji ('櫻井'); # Now $new = '桜井'
旧字体を親字体に変換する
This is an experimental cyrillization of kana based on the information in a Wikipedia article, http://en.wikipedia.org/wiki/Cyrillization_of_Japanese. The module author does not know anything about cyrillization of kana, so any assistance in correcting this is very welcome.
use Lingua::JA::Moji 'kana2cyrillic'; $cyril = kana2cyrillic ('シンブン'); # Now $cyril = 'симбун'
use Lingua::JA::Moji 'cyrillic2katakana'; $kana = cyrillic2katakana ('симбун'); # Now $kana = 'シンブン'
use Lingua::JA::Moji 'kana2hangul'; $hangul = kana2hangul ('すごわざ'); # Now $hangul = '스고와자'
This is based on a list found on the internet at http://kajiritate-no-hangul.com/kana.html. There is currently no proof of correctness.
There is a mailing list for this module and Convert::Moji at http://groups.google.com/group/perl-moji.
There are some bugs with romaji to kana conversion and vice-versa.
Other Perl modules on CPAN include
This is where I got several of the ideas for this module from. It contains validators for kanji and kana.
This is where several of the ideas for this module came from. It contains convertors for hiragana, half width and full width katakana, and romaji. The romaji conversion is less complete than this module but more compact and probably much faster.
Romanization of Japanese. The module also includes romanization of kanji via the kakasi kanji to romaji convertor, and other functions.
Validate romanized Japanese.
幣モジュールは冨田尚樹氏の「Perl CPANモジュールガイド」に説明しています。 (ISBN 978-4862671080 WEB+DB PRESS plus, 2011年4月出版)
This module exports its functions only on request. To export all the functions in the module,
use Lingua::JA::Moji ':all';
Ben Bullock, <bkb@cpan.org>
<bkb@cpan.org>
Copyright 2008-2011 Ben Bullock, all rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
Thanks to Naoki Tomita for various assitances (see http://groups.google.com/group/perl-moji/browse_thread/thread/10a42c35f7c22ebc).
To install Lingua::JA::Moji, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Lingua::JA::Moji
CPAN shell
perl -MCPAN -e shell install Lingua::JA::Moji
For more information on module installation, please visit the detailed CPAN module installation guide.