-
-
25 May 2012 20:30:05 UTC
- Distribution: Unicode-Casing
- Module version: 0.12
- Source (raw)
- Browse (raw)
- Changes
- How to Contribute
- Issues (0)
- Testers (559 / 0 / 3)
- Kwalitee
Bus factor: 1- License: unknown
- Activity
24 month- Tools
- Download (50.2KB)
- MetaCPAN Explorer
- Permissions
- Subscribe to distribution
- Permalinks
- This version
- Latest version
- Dependencies
- B::Hooks::OP::Check
- B::Hooks::OP::PPAddr
- Test::More
- XSLoader
- and possibly others
- Reverse dependencies
- CPAN Testers List
- Dependency graph
NAME
Unicode::Casing - Perl extension to override system case changing functions
SYNOPSIS
use Unicode::Casing uc => \&my_uc, lc => \&my_lc, ucfirst => \&my_ucfirst, lcfirst => \&my_lcfirst, fc => \&my_fc; no Unicode::Casing; package foo::bar; use Unicode::Casing -load; sub import { Unicode::Casing->import( uc => \&_uc, lc => \&_lc, ucfirst => \&_ucfirst, lcfirst => \&_lcfirst, fc => \&_fc, ); } sub unimport { Unicode::Casing->unimport; }
DESCRIPTION
This module allows overriding the system-defined character case changing operations. Any time something in its lexical scope would ordinarily call
lc()
,lcfirst()
,uc()
,ucfirst()
, orfc()
, the corresponding user-specified function will instead be called. This applies to direct calls (even those prefaced byCORE::
), and indirect calls via the\L
,\l
,\U
,\u
, and\F
escapes in double-quoted strings and regular expressions.Each function is passed a string whose case is to be changed, and should return the case-changed version of that string. Within the function's dynamic scope, references to the operation it is overriding use the non-overridden version. For example:
sub my_uc { my $string = shift; print "Debugging information\n"; return uc($string); } use Unicode::Casing uc => \&my_uc; uc($foo);
gives the standard upper-casing behavior, but prints "Debugging information" first. This also applies to the escapes. Using, for example,
\U
inside the override function foruc()
will call the non-overriddenuc()
. Since this applies across the dynamic scope, ifmy_uc
calls functiona
which callsb
which callsc
which callsuc
, thatuc
is the non-overridden version. Otherwise there would be the possibility of infinite recursion. And, it fits with the typical use of these functions, which is to use the standard case change except for a few select characters, as shown in the example below.It is an error to not specify at least one override in the "use" statement. Ones not specified use the standard operation. It is also an error to specify more than one override for the same function.
use re 'eval'
is not needed to have the inline case-changing sequences work in regular expressions.Here's an example of a real-life application, for Turkish, that shows context-sensitive case-changing. (Because of bugs in earlier Perls, version v5.12 is required for this example to work properly.)
sub turkish_lc($) { my $string = shift; # Unless an I is before a dot_above, it turns into a dotless i (the # dot above being attached to the I, without an intervening other # Above mark; an intervening non-mark (ccc=0) would mean that the # dot above would be attached to that character and not the I) $string =~ s/I (?! [^\p{ccc=0}\p{ccc=Above}]* \x{0307} )/\x{131}/gx; # But when the I is followed by a dot_above, remove the dot_above so # the end result will be i. $string =~ s/I ([^\p{ccc=0}\p{ccc=Above}]* ) \x{0307}/i$1/gx; $string =~ s/\x{130}/i/g; return lc($string); }
A potential problem with context-dependent case changing is that the routine may be passed insufficient context, especially with the in-line escapes like
\L
.90turkish.t, which comes with the distribution includes a full implementation of all the Turkish casing rules.
Note that there are problems with the standard case changing operation for characters whose code points are between 128 and 255. To get the correct Unicode behavior, the strings must be encoded in utf8 (which the override functions can force) or calls to the operations must be within the scope of
use feature 'unicode_strings'
(which is available starting in Perl version 5.12).Also, note that
fc()
and\F
are available only in Perls starting with version v5.15.8. Trying to override them on earlier versions will result in a fatal error.Note that there can be problems installing this (at least on Windows) if using an old version of ExtUtils::Depends. To get around this follow these steps:
upgrade ExtUtils::Depends
force install B::Hooks::OP::Check
force install B::Hooks::OP::PPAddr
See http://perlmonks.org/?node_id=797851.
BUGS
This module doesn't play well when there are other attempts to override the functions, such as
use subs qw(uc lc ...);
or*CORE::GLOBAL::uc = sub { .... };
. Which thing gets called depends on the ordering of the calls, and scoping rules break down.AUTHOR
Karl Williamson,
<khw@cpan.org>
, with advice and guidance from various Perl 5 porters, including Paul Evans, Burak Gürsoy, Florian Ragwitz, Ricardo Signes, and Matt S. Trout.COPYRIGHT AND LICENSE
Copyright (C) 2011 by Karl Williamson
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.10.1 or, at your option, any later version of Perl 5 you may have available.
Module Install Instructions
To install Unicode::Casing, copy and paste the appropriate command in to your terminal.
cpanm Unicode::Casing
perl -MCPAN -e shell install Unicode::Casing
For more information on module installation, please visit the detailed CPAN module installation guide.