utf8::all - turn on Unicode - all of it
version 0.024
use utf8::all; # Turn on UTF-8, all of it. open my $in, '<', 'contains-utf8'; # UTF-8 already turned on here print length 'føø bār'; # 7 UTF-8 characters my $utf8_arg = shift @ARGV; # @ARGV is UTF-8 too (only for main)
The use utf8 pragma tells the Perl parser to allow UTF-8 in the program text in the current lexical scope. This also means that you can now use literal Unicode characters as part of strings, variable names, and regular expressions.
use utf8
utf8::all goes further:
utf8::all
charnames are imported so \N{...} sequences can be used to compile Unicode characters based on names.
charnames
\N{...}
On Perl v5.11.0 or higher, the use feature 'unicode_strings' is enabled.
v5.11.0
use feature 'unicode_strings'
use feature fc and use feature unicode_eval are enabled on Perl 5.16.0 and higher.
use feature fc
use feature unicode_eval
5.16.0
Filehandles are opened with UTF-8 encoding turned on by default (including STDIN, STDOUT, and STDERR when utf8::all is used from the main package). Meaning that they automatically convert UTF-8 octets to characters and vice versa. If you don't want UTF-8 for a particular filehandle, you'll have to set binmode $filehandle.
STDIN
STDOUT
STDERR
main
binmode $filehandle
@ARGV gets converted from UTF-8 octets to Unicode characters (when utf8::all is used from the main package). This is similar to the behaviour of the -CA perl command-line switch (see perlrun).
@ARGV
-CA
readdir, readlink, readpipe (including the qx// and backtick operators), and glob (including the <> operator) now all work with and return Unicode characters instead of (UTF-8) octets (again only when utf8::all is used from the main package).
readdir
readlink
readpipe
qx//
glob
<>
The pragma is lexically-scoped, so you can do the following if you had some reason to:
{ use utf8::all; open my $out, '>', 'outfile'; my $utf8_str = 'føø bār'; print length $utf8_str, "\n"; # 7 print $out $utf8_str; # out as utf8 } open my $in, '<', 'outfile'; # in as raw my $text = do { local $/; <$in>}; print length $text, "\n"; # 10, not 7!
Instead of lexical scoping, you can also use no utf8::all to turn off the effects.
no utf8::all
Note that the effect on @ARGV and the STDIN, STDOUT, and STDERR file handles is always global and can not be undone!
As described above, the default behaviour of utf8::all is to convert @ARGV and to open the STDIN, STDOUT, and STDERR file handles with UTF-8 encoding, and override the readlink and readdir functions and glob operators when utf8::all is used from the main package.
If you want to disable these features even when utf8::all is used from the main package, add the option NO-GLOBAL (or LEXICAL-ONLY) to the use line. E.g.:
NO-GLOBAL
LEXICAL-ONLY
use utf8::all 'NO-GLOBAL';
If on the other hand you want to enable these global effects even when utf8::all was used from another package than main, use the option GLOBAL on the use line:
GLOBAL
use utf8::all 'GLOBAL';
utf8::all will handle invalid code points (i.e., utf-8 that does not map to a valid unicode "character"), as a fatal error.
For glob, readdir, and readlink, one can change this behaviour by setting the attribute "$utf8::all::UTF8_CHECK".
By default utf8::all marks decoding errors as fatal (default value for this setting is Encode::FB_CROAK). If you want, you can change this by setting $utf8::all::UTF8_CHECK. The value Encode::FB_WARN reports the encoding errors as warnings, and Encode::FB_DEFAULT will completely ignore them. Please see Encode for details. Note: Encode::LEAVE_SRC is always enforced.
Encode::FB_CROAK
$utf8::all::UTF8_CHECK
Encode::FB_WARN
Encode::FB_DEFAULT
Encode::LEAVE_SRC
Important: Only controls the handling of decoding errors in glob, readdir, and readlink.
If you use autodie, which is a great idea, you need to use at least version 2.12, released on June 26, 2012. Otherwise, autodie obliterates the IO layers set by the open pragma. See RT #54777 and GH #7.
Please report any bugs or feature requests on the bugtracker website.
When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.
The filesystems of Dos, Windows, and OS/2 do not (fully) support UTF-8. The readlink and readdir functions and glob operators will therefore not be replaced on these systems.
File::Find::utf8 for fully utf-8 aware File::Find functions.
Cwd::utf8 for fully utf-8 aware Cwd functions.
Michael Schwern <mschwern@cpan.org>
Mike Doherty <doherty@cpan.org>
Hayo Baan <info@hayobaan.com>
This software is copyright (c) 2009 by Michael Schwern <mschwern@cpan.org>; he originated it.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
To install utf8::all, copy and paste the appropriate command in to your terminal.
cpanm
cpanm utf8::all
CPAN shell
perl -MCPAN -e shell install utf8::all
For more information on module installation, please visit the detailed CPAN module installation guide.