Char::Eutf2 - Run-time routines for Char/


  use Char::Eutf2;


  # "no Char::Eutf2;" not supported


This module is a run-time routines of the Char/ Because the Char/ automatically uses this module, you need not use directly.


Please patches and report problems to author are welcome.


This Char::Eutf2 module first appeared in ActivePerl Build 522 Built under MSWin32 Compiled at Nov 2 1999 09:52:28


INABA Hitoshi <>

This project was originated by INABA Hitoshi. For any questions, use <> so we can share this file.


This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


Split string
  @split = Char::Eutf2::split(/pattern/,$string,$limit);
  @split = Char::Eutf2::split(/pattern/,$string);
  @split = Char::Eutf2::split(/pattern/);
  @split = Char::Eutf2::split('',$string,$limit);
  @split = Char::Eutf2::split('',$string);
  @split = Char::Eutf2::split('');
  @split = Char::Eutf2::split();
  @split = Char::Eutf2::split;

  Scans a UTF-2 $string for delimiters that match pattern and splits the UTF-2
  $string into a list of substrings, returning the resulting list value in list
  context, or the count of substrings in scalar context. The delimiters are
  determined by repeated pattern matching, using the regular expression given in
  pattern, so the delimiters may be of any size and need not be the same UTF-2
  $string on every match. If the pattern doesn't match at all, Char::Eutf2::split returns
  the original UTF-2 $string as a single substring. If it matches once, you get
  two substrings, and so on.
  If $limit is specified and is not negative, the function splits into no more than
  that many fields. If $limit is negative, it is treated as if an arbitrarily large
  $limit has been specified. If $limit is omitted, trailing null fields are stripped
  from the result (which potential users of pop would do well to remember).
  If UTF-2 $string is omitted, the function splits the $_ UTF-2 string.
  If $patten is also omitted, the function splits on whitespace, /\s+/, after
  skipping any leading whitespace.
  If the pattern contains parentheses, then the substring matched by each pair of
  parentheses is included in the resulting list, interspersed with the fields that
  are ordinarily returned.
  $tr = Char::Eutf2::tr($variable,$bind_operator,$searchlist,$replacementlist,$modifier);
  $tr = Char::Eutf2::tr($variable,$bind_operator,$searchlist,$replacementlist);

  This function scans a UTF-2 string character by character and replaces all
  occurrences of the characters found in $searchlist with the corresponding character
  in $replacementlist. It returns the number of characters replaced or deleted.
  If no UTF-2 string is specified via =~ operator, the $_ variable is translated.
  $modifier are:

  Modifier   Meaning
  c          Complement $searchlist
  d          Delete found but unreplaced characters
  s          Squash duplicate replaced characters
Chop string
  $chop = Char::Eutf2::chop(@list);
  $chop = Char::Eutf2::chop();
  $chop = Char::Eutf2::chop;

  Chops off the last character of a UTF-2 string contained in the variable (or
  UTF-2 strings in each element of a @list) and returns the character chopped.
  The Char::Eutf2::chop operator is used primarily to remove the newline from the end of
  an input record but is more efficient than s/\n$//. If no argument is given, the
  function chops the $_ variable.
Index string
  $pos = Char::Eutf2::index($string,$substr,$position);
  $pos = Char::Eutf2::index($string,$substr);

  Returns the position of the first occurrence of $substr in UTF-2 $string.
  The start, if specified, specifies the $position to start looking in the UTF-2
  $string. Positions are integer numbers based at 0. If the substring is not found,
  the Char::Eutf2::index function returns -1.
Reverse index string
  $pos = Char::Eutf2::rindex($string,$substr,$position);
  $pos = Char::Eutf2::rindex($string,$substr);

  Works just like Char::Eutf2::index except that it returns the position of the last
  occurence of $substr in UTF-2 $string (a reverse index). The function returns
  -1 if not found. $position, if specified, is the rightmost position that may be
  returned, i.e., how far in the UTF-2 string the function can search.
Make capture number
  $capturenumber = Char::Eutf2::capture($string);

  This function is internal use to m/ /, s/ / /, split / / and qr/ /.
Make character
  $chr = Char::Eutf2::chr($code);
  $chr = Char::Eutf2::chr_;

  This function returns the character represented by that $code in the character
  set. For example, Char::Eutf2::chr(65) is "A" in either ASCII or UTF-2, and
  Char::Eutf2::chr(0x82a0) is a UTF-2 HIRAGANA LETTER A. For the reverse of Char::Eutf2::chr,
  use Char::UTF2::ord.
Filename expansion (globbing)
  @glob = Char::Eutf2::glob($string);
  @glob = Char::Eutf2::glob_;

  Performs filename expansion (DOS-like globbing) on $string, returning the next
  successive name on each call. If $string is omitted, $_ is globbed instead.
  This function function when the pathname ends with chr(0x5C) on MSWin32.

  For example, C<<..\\l*b\\file/*glob.p?>> on MSWin32 or UNIX will work as
  expected (in that it will find something like '..\lib\File/'
  Note that all path components are
  case-insensitive, and that backslashes and forward slashes are both accepted,
  and preserved. You may have to double the backslashes if you are putting them in
  literally, due to double-quotish parsing of the pattern by perl.
  A tilde ("~") expands to the current user's home directory.

  Spaces in the argument delimit distinct patterns, so C<glob('*.exe *.dll')> globs
  all filenames that end in C<.exe> or C<.dll>. If you want to put in literal spaces
  in the glob pattern, you can escape them with either double quotes.
  e.g. C<glob('c:/"Program Files"/*/*.dll')>.
binary mode (Perl5.6 emulation on perl5.005)
  Char::Eutf2::binmode(FILEHANDLE, $disciplines);
  Char::Eutf2::binmode($filehandle, $disciplines);

  * two arguments

  If you are using perl5.005 other than MacPerl, Char::UTF2 software emulate perl5.6's
  binmode function. Only the point is here. See also perlfunc/binmode for details.

  This function arranges for the FILEHANDLE to have the semantics specified by the
  $disciplines argument. If $disciplines is omitted, ':raw' semantics are applied
  to the filehandle. If FILEHANDLE is an expression, the value is taken as the
  name of the filehandle or a reference to a filehandle, as appropriate.
  The binmode function should be called after the open but before any I/O is done
  on the filehandle. The only way to reset the mode on a filehandle is to reopen
  the file, since the various disciplines may have treasured up various bits and
  pieces of data in various buffers.

  The ":raw" discipline tells Perl to keep its cotton-pickin' hands off the data.
  For more on how disciplines work, see the open function.
open file (Perl5.6 emulation on perl5.005)
  $rc = Char::Eutf2::open(FILEHANDLE, $mode, $expr);
  $rc = Char::Eutf2::open(FILEHANDLE, $expr);
  $rc = Char::Eutf2::open(FILEHANDLE);
  $rc = Char::Eutf2::open(my $filehandle, $mode, $expr);
  $rc = Char::Eutf2::open(my $filehandle, $expr);
  $rc = Char::Eutf2::open(my $filehandle);

  * autovivification filehandle
  * three arguments

  If you are using perl5.005, Char::UTF2 software emulate perl5.6's open function.
  Only the point is here. See also perlfunc/open for details.

  As that example shows, the FILEHANDLE argument is often just a simple identifier
  (normally uppercase), but it may also be an expression whose value provides a
  reference to the actual filehandle. (The reference may be either a symbolic
  reference to the filehandle name or a hard reference to any object that can be
  interpreted as a filehandle.) This is called an indirect filehandle, and any
  function that takes a FILEHANDLE as its first argument can handle indirect
  filehandles as well as direct ones. But open is special in that if you supply
  it with an undefined variable for the indirect filehandle, Perl will automatically
  define that variable for you, that is, autovivifying it to contain a proper
  filehandle reference.

      my $fh;                          # (uninitialized)
      Char::Eutf2::open($fh, ">logfile")     # $fh is autovivified
          or die "Can't create logfile: $!";
          ...                          # do stuff with $fh
  }                                    # $fh closed here

  The my $fh declaration can be readably incorporated into the open:

  Char::Eutf2::open my $fh, ">logfile" or die ...

  The > symbol you've been seeing in front of the filename is an example of a mode.
  Historically, the two-argument form of open came first. The recent addition of
  the three-argument form lets you separate the mode from the filename, which has
  the advantage of avoiding any possible confusion between the two. In the following
  example, we know that the user is not trying to open a filename that happens to
  start with ">". We can be sure that they're specifying a $mode of ">", which opens
  the file named in $expr for writing, creating the file if it doesn't exist and
  truncating the file down to nothing if it already exists:

  Char::Eutf2::open(LOG, ">", "logfile") or die "Can't create logfile: $!";

  With the one- or two-argument form of open, you have to be careful when you use
  a string variable as a filename, since the variable may contain arbitrarily
  weird characters (particularly when the filename has been supplied by arbitrarily
  weird characters on the Internet). If you're not careful, parts of the filename
  might get interpreted as a $mode string, ignorable whitespace, a dup specification,
  or a minus.
  Here's one historically interesting way to insulate yourself:

  $path =~ s#^([ ])#./$1#;
  Char::Eutf2::open (FH, "< $path\0") or die "can't open $path: $!";

  But that's still broken in several ways. Instead, just use the three-argument
  form of open to open any arbitrary filename cleanly and without any (extra)
  security risks:

  Char::Eutf2::open(FH, "<", $path) or die "can't open $path: $!";

  As of the 5.6 release of Perl, you can specify binary mode in the open function
  without a separate call to binmode. As part of the $mode
  argument (but only in the three-argument form), you may specify various input
  and output disciplines.
  To do the equivalent of a binmode, use the three argument form of open and stuff
  a discipline of :raw in after the other $mode characters:

  Char::Eutf2::open(FH, "<:raw", $path) or die "can't open $path: $!";

  Table 1. I/O Disciplines
  Discipline      Meaning
  :raw            Binary mode; do no processing
  :crlf           Text mode; Intuit newlines
                  (DOS-like system only)
  :encoding(...)  Legacy encoding

  You'll be able to stack disciplines that make sense to stack, so, for instance,
  you could say:

  Char::Eutf2::open(FH, "<:crlf:encoding(Char::UTF2)", $path) or die "can't open $path: $!";