The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Convert::Moji - Convert between alphabets

SYNOPSIS

    # Examples of rot13 transformers:
    use Convert::Moji;
    # Using a table
    my %rot13;
    @rot13{('a'..'z')} = ('n'..'z','a'..'m');
    my $rot13 = Convert::Moji->new (["table", \%rot13]);
    # Using tr
    my $rot13_1 = Convert::Moji->new (["tr", "a-z", "n-za-m"]);
    # Using a callback
    sub rot_13_sub { tr/a-z/n-za-m/; return $_ }
    my $rot13_2 = Convert::Moji->new (["code", \&rot_13_sub]);
    # Then to do the actual conversion
    my $out = $rot13->convert ("secret");
    # You also can go backwards with
    my $inverted = $rot13->invert ("frperg");
    print "$out\n$inverted\n";
    
    

produces output

    frperg
    secret

(This example is included as rot13.pl in the distribution.)

VERSION

This documents Convert::Moji version 0.11 corresponding to git commit 7bf3dfd9543df5abd6e592144cae7075b7c27d3f released on Sat Mar 13 17:13:45 2021 +0900.

DESCRIPTION

Convert::Moji objects convert between different alphabets. For example, a Convert::Moji object can convert between Greek letters and the English alphabet, or convert between phonetic symbols in Unicode and a representation of them in ASCII.

This started as a helper module for Lingua::JA::Moji, where it is used for converting between various Japanese methods of writing. It was split out of that module to be a general-purpose converter for any alphabets.

METHODS

new

    my $convert = Convert::Moji->new (["table", $mytable]);

Create the object. The arguments are a list of array references, one for each conversion.

Conversions can be chained together:

    my $does_something = Convert::Moji->new (["table", $mytable],
                                             ["tr", $left, $right]);

The array references must have one of the following keywords as their first argument.

table

After this comes one more argument, a reference to the hash containing the table. For example

    use Convert::Moji;
    my %crazyhash = ("a" => "apple", "b" => "banana");
    my $conv = Convert::Moji->new (["table", \%crazyhash]);
    my $out = $conv->convert ("a b c");
    my $back = $conv->invert ($out);
    print "$out, $back\n";

produces output

    apple banana c, a b c

(This example is included as crazyhash.pl in the distribution.)

The hash keys and values can be any length.

file

After this comes one more argument, the name of a file containing some information to convert into a hash table. The file format is space-separated pairs, no comments or blank lines allowed. If the file does not exist or cannot be opened, the module prints an error message, and returns the undefined value.

code

After this comes one or two references to subroutines. The first subroutine is the conversion and the second one is the inversion routine. If you omit the second routine, it is equivalent to specifying "oneway".

tr

After this come two arguments, the left and right hand sides of a "tr" expression, for example

     Convert::Moji->new (["tr", "A-Z", "a-z"])

will convert upper to lower case. A "tr" is performed, and inversely for the invert case.

Conversions, via "convert", will be performed in the order of the arguments to new. Inversions will be performed in reverse order of the arguments, skipping uninvertibles.

Uninvertible operations

If your conversion doesn't actually go backwards, you can tell the module when you create the object using a keyword "oneway":

    my $uninvertible = Convert::Moji->new (["oneway", "table", $mytable]);

Then the method $uninvertible->invert doesn't do anything. You can also selectively choose which operations of a list are invertible and which aren't, so that only the invertible ones do something.

Load from a file

To load a character conversion table from a file, use

Convert::Moji->new (["file", $filename]);

In this case, the file needs to contain a space-separated list of items to be converted one into the other, such as

    alpha α
    beta β
    gamma γ

The file reading cannot handle comments or blank lines in the file. Examples of use of this format are "kana2hw" in Lingua::JA::Moji, "circled2kanji" in Lingua::JA::Moji, and "bracketed2kanji" in Lingua::JA::Moji.

convert

After building the object, it is used to convert text with the "convert" method. The convert method takes one argument, a scalar string to be converted by the rules we specified with "new".

This ignores (passes through) characters which it can't convert.

invert

This inverts the input.

This takes two arguments. The first is the string to be inverted back through the conversion process, and the second is the type of conversion to perform if the inversion is ambiguous. This can take one of the following values

first

If the inversion is ambiguous, it picks the first one it finds.

random

If the inversion is ambiguous, it picks one at random.

all

In this case you get an array reference back containing either strings where the inversion was unambiguous, or array references to arrays containing all possible strings.

all_joined

Like "all", but you get a scalar with all the options in square brackets instead of lots of array references.

The second argument part is only implemented for hash table based conversions, and is very likely to be buggy even then.

FUNCTIONS

These are helper functions for the module.

length_one

    # Returns false:
    length_one ('x', 'y', 'monkey');
    # Returns true:    
    length_one ('x', 'y', 'm');

Returns true if every element of the array has a length equal to one, and false if any of them does not have length one. The "make_regex" function uses this to decide whether to use a [abc] or a (a|b|c) style regex.

make_regex

    my $regex = make_regex (qw/a b c de fgh/);

    # $regex = "fgh|de|a|b|c";

Given a list of strings, this makes a regular expression which matches any of the strings in the list, longest match first. Each of the elements of the list is quoted using quotemeta. The regular expression does not contain capturing parentheses.

To convert everything in string $x from the keys of %foo2bar to its values,

    use Convert::Moji 'make_regex';
    my $x = 'mad, bad, and dangerous to know';
    my %foo2bar = (mad => 'max', dangerous => 'trombone');
    my $regex = make_regex (keys %foo2bar);
    $x =~ s/($regex)/$foo2bar{$1}/g;
    print "$x\n";

produces output

    max, bad, and trombone to know

(This example is included as trombone.pl in the distribution.)

For another example, see the "joke" program at "english" in Data::Kanji::Kanjidic.

unambiguous

    my $invertible = unambiguous (\%table));

Returns true if all of the values in %table are distinct, and false if any two of the values in %table are the same. This is used by "invert" to decide whether a table can be reversed.

    use utf8;
    use FindBin '$Bin';
    use Convert::Moji 'unambiguous';
    my %ambig = (
        a => 'b',
        c => 'b',
    );
    my %unambig = (
        a => 'b',
        c => 'd',
    );
    for my $thing (\%ambig, \%unambig) {
        if (unambiguous ($thing)) {
            print "un";
        }
        print "ambiguous\n";
    }
    

produces output

    ambiguous
    unambiguous

(This example is included as unambiguous.pl in the distribution.)

SEE ALSO

Lingua::JA::Moji

Uses this module.

Lingua::KO::Munja

Uses this module.

"list2re" in Data::Munge

This is similar to "make_regex" in this module.

Lingua::Translit

Transliterates text between writing systems

Match a dictionary against a string

A list of various other CPAN modules for matching a dictionary of words against strings.

EXPORTS

The functions "make_regex", "length_one" and "unambiguous" are exported on demand. There are no export tags.

DEPENDENCIES

Carp

Functions carp and croak are used to report errors.

AUTHOR

Ben Bullock, <bkb@cpan.org>

COPYRIGHT & LICENCE

This package and associated files are copyright (C) 2008-2021 Ben Bullock.

You can use, copy, modify and redistribute this package and associated files under the Perl Artistic Licence or the GNU General Public Licence.