The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Unicode::Casing - Perl extension to override system case changing functions

SYNOPSIS

use Unicode::Casing
          uc => \&my_uc, lc => \&my_lc,
          ucfirst => \&my_ucfirst, lcfirst => \&my_lcfirst,
          fc => \&my_fc;
no Unicode::Casing;

package foo::bar;
  use Unicode::Casing -load;
  sub import {
      Unicode::Casing->import(
          uc      => \&_uc,
          lc      => \&_lc,
          ucfirst => \&_ucfirst,
          lcfirst => \&_lcfirst,
          fc => \&_fc,
      );
  }
  sub unimport {
      Unicode::Casing->unimport;
  }

DESCRIPTION

This module allows overriding the system-defined character case changing operations. Any time something in its lexical scope would ordinarily call lc(), lcfirst(), uc(), ucfirst(), or fc(), the corresponding user-specified function will instead be called. This applies to direct calls (even those prefaced by CORE::), and indirect calls via the \L, \l, \U, \u, and \F escapes in double-quoted strings and regular expressions.

Each function is passed a string whose case is to be changed, and should return the case-changed version of that string. Within the function's dynamic scope, references to the operation it is overriding use the non-overridden version. For example:

sub my_uc {
   my $string = shift;
   print "Debugging information\n";
   return uc($string);
}
use Unicode::Casing uc => \&my_uc;
uc($foo);

gives the standard upper-casing behavior, but prints "Debugging information" first. This also applies to the escapes. Using, for example, \U inside the override function for uc() will call the non-overridden uc(). Since this applies across the dynamic scope, if my_uc calls function a which calls b which calls c which calls uc, that uc is the non-overridden version. Otherwise there would be the possibility of infinite recursion. And, it fits with the typical use of these functions, which is to use the standard case change except for a few select characters, as shown in the example below.

It is an error to not specify at least one override in the "use" statement. Ones not specified use the standard operation. It is also an error to specify more than one override for the same function.

use re 'eval' is not needed to have the inline case-changing sequences work in regular expressions.

Here's an example of a real-life application, for Turkish, that shows context-sensitive case-changing. (Because of bugs in earlier Perls, version v5.12 is required for this example to work properly.)

sub turkish_lc($) {
   my $string = shift;

   # Unless an I is before a dot_above, it turns into a dotless i (the
   # dot above being attached to the I, without an intervening other
   # Above mark; an intervening non-mark (ccc=0) would mean that the
   # dot above would be attached to that character and not the I)
   $string =~ s/I (?! [^\p{ccc=0}\p{ccc=Above}]* \x{0307} )/\x{131}/gx;

   # But when the I is followed by a dot_above, remove the dot_above so
   # the end result will be i.
   $string =~ s/I ([^\p{ccc=0}\p{ccc=Above}]* ) \x{0307}/i$1/gx;

   $string =~ s/\x{130}/i/g;

   return lc($string);
}

A potential problem with context-dependent case changing is that the routine may be passed insufficient context, especially with the in-line escapes like \L.

90turkish.t, which comes with the distribution includes a full implementation of all the Turkish casing rules.

Note that there are problems with the standard case changing operation for characters whose code points are between 128 and 255. To get the correct Unicode behavior, the strings must be encoded in utf8 (which the override functions can force) or calls to the operations must be within the scope of use feature 'unicode_strings' (which is available starting in Perl version 5.12).

Also, note that fc() and \F are available only in Perls starting with version v5.15.8. Trying to override them on earlier versions will result in a fatal error.

Note that there can be problems installing this (at least on Windows) if using an old version of ExtUtils::Depends. To get around this follow these steps:

  1. upgrade ExtUtils::Depends

  2. force install B::Hooks::OP::Check

  3. force install B::Hooks::OP::PPAddr

See http://perlmonks.org/?node_id=797851.

BUGS

This module doesn't play well when there are other attempts to override the functions, such as use subs qw(uc lc ...); or *CORE::GLOBAL::uc = sub { .... };. Which thing gets called depends on the ordering of the calls, and scoping rules break down.

AUTHOR

Karl Williamson, <khw@cpan.org>, with advice and guidance from various Perl 5 porters, including Paul Evans, Burak Gürsoy, Florian Ragwitz, Ricardo Signes, and Matt S. Trout.

COPYRIGHT AND LICENSE

Copyright (C) 2011 by Karl Williamson

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.10.1 or, at your option, any later version of Perl 5 you may have available.