NAME

Lingua::JA::Name::Splitter - split a Japanese name into given and family

SYNOPSIS

    use utf8;
    use Lingua::JA::Name::Splitter 'split_kanji_name';
    my ($family, $given) = split_kanji_name ('風太郎');
    print ("$family $given\n");

produces output

    風 太郎

(This example is included as synopsis.pl in the distribution.)

VERSION

This documents Lingua::JA::Name::Splitter version 0.12 corresponding to git commit 1b29819c126aab5760768591bf2a3f1fca61f520 released on Thu Jul 13 17:04:07 2023 +0900.

This module is based on the "Enamdict" data released on 2023-07-08.

DESCRIPTION

This module attempts to split the names of Japanese people into given and family names.

FUNCTIONS

kkname

    my $iskk = kkname ($kanjiname);

This returns a true value if $kanjiname appears to be a kanji/kana name, and false if not.

split_kanji_name

    my ($family, $given) = split_kanji_name ('渡辺純子');

Native Japanese writing does not use spaces, so names appear as a string of characters with no break. This function provides a "guesswork" solution for dealing with names. It is a rough guess based on a simple algorithm, and thus is suitable for those who need to deal with large numbers of names quickly. Its output is not reliable, and must be checked by a human.

The heuristic methods used are as follows. The first character is assumed to be the family name's, and the last character is assumed to be the given name's. When there are more than two characters in the name, hiragana are assumed part of the given name. Kanji characters are weighted by distance from the beginning of the name. A dictionary of probabilities of family or given name kanji is also used to weight some characters. The name is then split at the first character which seems more likely to be part of the given name.

split_romaji_name

    my ($first, $last) = split_romaji_name ($name);

Given a string containing a name of a Japanese person in romanized form, guess which part is the first and which part is the last name using the spaces, capitalization and commas in the name.

Japanese people write their names in a variety of romanized formats, such as "KATSU, Shintaro", "Shintaro Katsu", "KATSU Shintaro", or even "ShintaroKATSU". This function is intended as a "rock breaker" for processing a large number of Japanese names in romanized form. Its output needs to be checked by a human.

    use Lingua::JA::Name::Splitter 'split_romaji_name';
    for my $name ('KATSU, Shintaro', 'Risa Yoshiki') {
        my ($first, $last) = split_romaji_name ($name);
        print "$first $last\n";
    }

produces output

    Shintaro Katsu
    Risa Yoshiki

(This example is included as katsu-yoshiki.pl in the distribution.)

DEPENDENCIES

Carp: Carp is used to report errors.
Lingua::JA::Moji: Lingua::JA::Moji is used to process Japanese characters and detect romanised Japanese.

EXPORTS

Nothing is exported by default. "split_kanji_name" and "split_romaji_name" are exported on demand. An export tag :all exports both functions.

    use Lingua::JA::Name::Splitter ':all';

AUTHOR

Ben Bullock, <bkb@cpan.org>

COPYRIGHT & LICENCE

You can use, copy, modify and redistribute this package and associated files under the Perl Artistic Licence or the GNU General Public Licence.

To install Lingua::JA::Name::Splitter, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Lingua::JA::Name::Splitter

CPAN shell

perl -MCPAN -e shell
install Lingua::JA::Name::Splitter

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

SYNOPSIS

VERSION

DESCRIPTION

FUNCTIONS

kkname

split_kanji_name

split_romaji_name

DEPENDENCIES

EXPORTS

SEE ALSO

About Japanese names

Enamdict

Build scripts

AUTHOR

COPYRIGHT & LICENCE

NAME

SYNOPSIS

VERSION

DESCRIPTION

FUNCTIONS

kkname

split_kanji_name

split_romaji_name

DEPENDENCIES

EXPORTS

SEE ALSO

About Japanese names

Enamdict

Build scripts

AUTHOR

COPYRIGHT & LICENCE

Module Install Instructions