Unicode::Overload - Perl source filter to implement Unicode operations


  use charnames ':full';
  use Unicode::Overload (
    "\N{UNION}" => infix =>
      sub { my %a = map{$_=>1}@{$_[0]};
            my %b = map{$_=>1}@{$_[1]};
            return keys(%a,$b); },
    "\N{SUPERSCRIPT TWO}" => postfix => sub { $_[0] ** 2 },
    "\N{NOT SIGN}" => prefix => sub { !$_[0] },
    [ "\N{LEFT FLOOR}", "\N{RIGHT FLOOR}" ] => outfix =>
      sub { POSIX::floor($_[0]) },

  @union = (@a \N{UNION @b); # Parentheses REQUIRED
  die "Pythagoras was WRONG!" # Same here
    unless sqrt((3)\N{SUPERSCRIPT TWO} + (4)\N{SUPERSCRIPT TWO}) == 5;
  $b = \N{NOT SIGN}($b); # Required here too
  die "Fell through floor" # Balanced characters form their own parentheses
    unless \N{LEFT FLOOR}-3.2\N{RIGHT FLOOR} == 4;


Allows you to declare your own Unicode operators and have them behave as prefix (like sigma or integral), postfix (like superscripted 2), infix (like union), or outfix (like the floor operator, with the 'L'-like and 'J'-like brackets).

To keep this document friendly to people without UTF-8 terminals, the \N{} syntax for Unicode characters will be used throughout, but please note that the \N{} characters can be replaced with the actual UTF-8 characters anywhere.

Also, please note that since Perl 5 doesn't support the notion of arbitrary operators, this module cheats and uses source filters to do its job. As such, all "operators" must have their arguments enclosed in parentheses. This limitation will be lifted when a better way to do this is found.

Also, note that since these aren't "real" operators there is no way (at the moment) to specify precedence. All Unicode "operators" have the precedence (such as it is) of function calls, as they all get transformed into function calls inline before interpreting.

In addition, due to a weird unicode-related bug, only one character per operator is currently permitted. Despite behaving correctly elsewhere, substr() thinks that one character equals one byte inside Unicode::Overload .

Anyway, this module defines four basic types of operators. Prefix and infix should be familiar to most users of perl, as prefix operators are basically function calls without the parens. Infix operators are of course the familiar + etcetera.

The best analogy for postfix operators is probably the algebraic notation for squares. $a**2 is perl's notation, ($a)\N{SUPERSCRIPT TWO} is the Unicode::Overload equivalent, looking much closer to a mathematical expression, with the '2' in its proper position.

Outfix is the last operator, and a little odd. Outfix can best be thought of as user-definable brackets. One of the more common uses for this notation again comes from mathematics in the guise of the floor operator. Looking like brackets with the top bar missing, they return effectively POSIX::floor() of their contents.

Since outfix operators define their own brackets, extra parentheses are not needed on this type of operator.

A quick summary follows:


Operator goes directly before the parentheses containing its operands. Whitespace is allowed between the operator and opening parenthesis. This acts like a function call.

Sample: \N{NOT SIGN}($b)


Operator goes directly after the parentheses containing its operands. Whitespace is allowed between the closing parenthesis and operator. This doesn't have a good Perl equivalent, but there are many equivalents in algebra, probably the most common being:

Sample: ($a+$b)\N{SUPERSCRIPT TWO}


Operator goes somewhere inside the parentheses. Whitespace is allowed between either parenthesis and the operator.

Sample: ($a \N{ELEMENT OF} @list)


Operators surround their arguments and are translated into parentheses. As such, whitespace is allowed anywhere inside the operator pairs. There is no requirement that the operators be visually symmetrical, although it helps.

Sampe: $c=\N{LEFT FLOOR}$a_+$b\N{RIGHT FLOOR}

The requirements for parentheses will be removed as soon as I can figure out how to make these operators behave closer to perl builtins. Nesting is perfectly legal, but multiple infix operators can't coexists within one set of parentheses.





Jeffrey Goff, <<gt>


Copyright (C) 2003 by Jeffrey Goff

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.1 or, at your option, any later version of Perl 5 you may have available.