NAME

Unicode::Diacritic::Strip - strip diacritics from Unicode text

SYNOPSIS

    use utf8;
    use Unicode::Diacritic::Strip ':all';
    my $in = 'àÀâÂäçéÉèÈêÊëîïôùÙûüÜがぎぐげご';
    print strip_diacritics ($in), "\n";
    print fast_strip ($in), "\n";
    

produces output

    aAaAaceEeEeEeiiouUuuUかきくけこ
    aAaAaceEeEeEeiiouUuuUがぎぐげご

(This example is included as synopsis.pl in the distribution.)

VERSION

This documents Unicode::Diacritic::Strip version 0.07 corresponding to git commit 4f186a937a7fd63c755264637aaba25d6eab6dfb released on Tue Feb 28 14:44:38 2017 +0900.

DESCRIPTION

This module offers two ways to remove diacritics from Unicode text. One of them, "strip_diacritics", uses the unicode decompositions to break the characters down. The other one, "fast_strip", uses tr with a big list of alphabetical characters with and without diacritics.

FUNCTIONS

strip_diacritics

    my $stripped = strip_diacritics ($text);

Strip diacritics from $text. The diacritics are as defined by the Unicode Character Database. See Unicode::UCD.

fast_strip

    my $stripped = fast_strip ($text);

Rapidly strip alphabetical Unicode characters to the nearest plain ASCII equivalents. This is just a big list of characters and a tr to zap them into ASCII.

    use utf8;
    use FindBin '$Bin';
    use Unicode::Diacritic::Strip 'fast_strip';
    my $unicode = 'Bjørn Łódź';
    print fast_strip ($unicode), "\n";
    

produces output

    Bjorn Lodz

(This example is included as ask.pl in the distribution.)

SEE ALSO

CPAN modules

Text::Undiacritic

EXPORTS

Nothing is exported by default. The functions "strip_diacritics" and "fast_strip" are exported on demand. A tag :all exports all the functions from the module.

AUTHOR

Ben Bullock, <bkb@cpan.org>

COPYRIGHT & LICENCE

This package and associated files are copyright (C) 2012-2017 Ben Bullock.

You can use, copy, modify and redistribute this package and associated files under the Perl Artistic Licence or the GNU General Public Licence.