=head1 NAME

Unicode::MapUTF8 - Conversions to and from arbitrary character sets and UTF8

=head1 SYNOPSIS

 use Unicode::MapUTF8 qw(to_utf8 from_utf8 utf8_supported_charset);

 # Convert a string in 'ISO-8859-1' to 'UTF8'
 my $output = to_utf8({ -string => 'An example', -charset => 'ISO-8859-1' });

 # Convert a string in 'UTF8' encoding to encoding 'ISO-8859-1'
 my $other  = from_utf8({ -string => 'Other text', -charset => 'ISO-8859-1' });

 # List available character set encodings
 my @character_sets = utf8_supported_charset;

 # Add a character set alias
 utf8_charset_alias({ 'ms-japanese' => 'sjis' });

 # Convert between two arbitrary (but largely compatible) charset encodings
 # (SJIS to EUC-JP)
 my $utf8_string   = to_utf8({ -string =>$sjis_string, -charset => 'sjis'});
 my $euc_jp_string = from_utf8({ -string => $utf8_string, -charset => 'euc-jp' })

 # Verify that a specific character set is supported
 if (utf8_supported_charset('ISO-8859-1') {
     # Yes
 }

=head1 DESCRIPTION

Provides an adapter layer between core routines for converting
to and from UTF8 and other encodings. In essence, a way to give multiple
existing Unicode modules a single common interface so you don't have to know
the underlaying implementations to do simple UTF8 to-from other character set
encoding conversions. As such, it wraps the Unicode::String, Unicode::Map8,
Unicode::Map and Jcode modules in a standardized and simple API.

This also provides general character set conversion operation based on UTF8 - it is
possible to convert between any two compatible and supported character sets
via a simple two step chaining of conversions.

As with most things Perlish - if you give it a few big chunks of text to chew on
instead of lots of small ones it will handle many more characters per second.

By design, it can be easily extended to encompass any new charset encoding
conversion modules that arrive on the scene.

This module is intended to provide good Unicode support to versions of Perl
prior to 5.8. If you are using Perl 5.8.0 or later, you probably want to be
using the Encode module instead. This module B<does> work with Perl 5.8,
but Encode is the preferred method in that environment.

=head1 CHANGES

1.14 2020.09.27   Fixing POD breakage in EUC-JP version of POD

1.13 2020.09.27   Fixing MANIFEST.SKIP error

1.12 2020.09.27   Build tool updates. Maintainer updates. POD error fixes.
                  Relicensed under MIT license.

1.11 2005.10.10   Documentation changes. Addition of Build.PL support.
                  Added various build tests, LICENSE, Artistic_License.txt,
                  GPL_License.txt. Split documentation into seperate
                  .pod file. Added Japanese translation of POD.

1.10 2005.05.22 - Fixed bug in conversion of ISO-2022-JP to UTF-8.
                  Problem and fix found by Masahiro HONMA
                  <masahiro.honma@tsutaya.co.jp>.

                  Similar bugs in conversions of shift_jis and euc-jp
                  to UTF-8 corrected as well.

1.09 2001.08.22 - Fixed multiple typo occurances of 'uft'
                  where 'utf' was meant in code. Problem affected
                  utf16 and utf7 encodings. Problem found
                  by devon smith <devon@taller.PSCL.cwru.edu>

1.08 2000.11.06 Added 'utf8_charset_alias' function to allow for runtime
                setting of character set aliases. Added several alternate
                names for 'sjis' (shiftjis, shift-jis, shift_jis, s-jis,
                and s_jis). 

                Corrected 'croak' messages for 'from_utf8' functions to
                appropriate function name.

                Corrected fatal problem in jcode-unicode internals. Problem 
                and fix found by Brian Wisti <wbrian2@uswest.net>.

1.07 2000.11.01 Added 'croak' to use Carp declaration to fix error
                messages. Problem and fix found by <wbrian2@uswest.net>.

1.06 2000.10.30 Fix to handle change in stringification of overloaded
                objects between Perl 5.005 and 5.6.  
                Problem noticed by Brian Wisti <wbrian2@uswest.net>.

1.05 2000.10.23 Error in conversions from UTF8 to multibyte encodings corrected

1.04 2000.10.23 Additional diagnostic error messages added for
                internal errors

1.03 2000.10.22 Bug fix for load time Unicode::Map encoding
                detection

1.02 2000.10.22 Bug fix to 'from_utf8' method and load time
                detection of Unicode::Map8 supported character
                set encodings

1.01 2000.10.02 Initial public release

=head1 FUNCTIONS

=over 4

=item utf8_charset_alias({ $alias => $charset });

Used for runtime assignment of character set aliases.

Called with no parameters, returns a hash of defined aliases and the character sets
they map to.

Example:

  my $aliases     = utf8_charset_alias;
  my @alias_names = keys %$aliases;

If called with ONE parameter, returns the name of the 'real' charset
if the alias is defined. Returns undef if it is not found in the aliases.

Example:

    if (! utf8_charset_alias('VISCII')) {
        # No alias for this
    }

If called with a list of 'alias' => 'charset' pairs, defines those aliases for use.

Example:

    utf8_charset_alias({ 'japanese' => 'sjis', 'japan' => 'sjis' });

Note: It will croak if a passed pair does not map to a character set
defined in the predefined set of character encoding. It is NOT
allowed to alias something to another alias.

Multiple character set aliases can be set with a single call.

To clear an alias, pass a character set mapping of undef.

Example:

    utf8_charset_alias({ 'japanese' => undef });

While an alias is set, the 'utf8_supported_charset' function
will return the alias as if it were a predefined charset.

Overriding a base defined character encoding with an alias
will generate a warning message to STDERR.

=back

=over 4

=item utf8_supported_charset($charset_name);


Returns true if the named charset is supported (including
user defined aliases).

Returns false if it is not.

Example:

    if (! utf8_supported_charset('VISCII')) {
        # No support yet
    }

If called in a list context with no parameters, it will return
a list of all supported character set names (including user
defined aliases).

Example:

    my @charsets = utf8_supported_charset;

=back

=over 4

=item to_utf8({ -string => $string, -charset => $source_charset });

Returns the string converted to UTF8 from the specified source charset.

=back

=over 4

=item from_utf8({ -string => $string, -charset => $target_charset});

Returns the string converted from UTF8 to the specified target charset.

=back

=head1 VERSION

1.14 2020.09.27

=head1 TODO

Regression tests for Jcode, 2-byte encodings and encoding aliases

=head1 SEE ALSO

L<Unicode::String> L<Unicode::Map8> L<Unicode::Map> L<Jcode> L<Encode>

=head1 COPYRIGHT

Copyright 2000-2020, Jerilyn Franz. All rights reserved.

=head1 AUTHOR

Jerilyn Franz <cpan@jerilyn.info>

=head1 LICENSE


MIT License

Copyright (c) 2020 Jerilyn Franz

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

=cut