The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Search::Tools::Transliterate - transliterations of UTF-8 chars

SYNOPSIS

 my $tr = Search::Tools::Transliterate->new();
 
 print $tr->convert( 'some string of utf8 chars' );
 
 

DESCRIPTION

Search::Tools::Transliterate transliterates UTF-8 characters to single-byte equivalents. It is based on the transmap project by Markus Kuhn http://www.cl.cam.ac.uk/~mgk25/.

METHODS

new

Create new instance.

is_valid_utf8( text )

Returns true if text is a valid sequence of UTF-8 bytes. It does not check if the internal Perl utf8 flag is set or not.

is_ascii( text )

If text contains no bytes above 127, then returns true (1). Otherwise, returns false (0). Used by convert() internally to check text prior to transliterating.

convert( text )

Returns UTF-8 text converted with all single bytes, transliterated according to %Map. Will croak if text is not valid UTF-8, so if in doubt, check first with is_valid_utf8().

VARIABLES

%Map package variable holds all the character mappings. You can alter it to taste with:

 use Search::Tools::Transliterate;
 my $tr = Search::Tools::Transliterate->new;
 $Search::Tools::Transliterate::Map{mychar} = 'my transliteration';

BUGS

You might consider the whole attempt as a bug. It's really an attempt to accomodate applications that don't support Unicode. Perhaps we shouldn't even try. But for things like curly quotes and other 'smart' punctuation, it's often helpful to render the UTF-8 character as something rather than just letting a character without a direct translation slip into the ether.

That said, if a character has no mapping (and there are plenty that do not) a single space will be used.

AUTHOR

Peter Karman perl@peknet.com

Thanks to Atomic Learning www.atomiclearning.com for sponsoring the development of this module.

COPYRIGHT

Copyright 2006 by Peter Karman. This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO

Search::Tools, Unicode::Map, Encode, Test::utf8