-
-
05 Jan 2018 08:21:37 UTC
- Distribution: utf8-all
- Module version: 0.024
- Source (raw)
- Browse (raw)
- Changes
- Homepage
- How to Contribute
- Repository
- Issues
- Testers (4546 / 0 / 0)
- Kwalitee
Bus factor: 2- 89.89% Coverage
- License: perl_5
- Perl: v5.10.0
- Activity
24 month- Tools
- Download (26.63KB)
- MetaCPAN Explorer
- Permissions
- Subscribe to distribution
- Permalinks
- This version
- Latest version
- NAME
- VERSION
- SYNOPSIS
- DESCRIPTION
- ATTRIBUTES
- INTERACTION WITH AUTODIE
- BUGS
- COMPATIBILITY
- SEE ALSO
- AUTHORS
- COPYRIGHT AND LICENSE
NAME
utf8::all - turn on Unicode - all of it
VERSION
version 0.024
SYNOPSIS
use utf8::all; # Turn on UTF-8, all of it. open my $in, '<', 'contains-utf8'; # UTF-8 already turned on here print length 'føø bār'; # 7 UTF-8 characters my $utf8_arg = shift @ARGV; # @ARGV is UTF-8 too (only for main)
DESCRIPTION
The
use utf8
pragma tells the Perl parser to allow UTF-8 in the program text in the current lexical scope. This also means that you can now use literal Unicode characters as part of strings, variable names, and regular expressions.utf8::all
goes further:charnames
are imported so\N{...}
sequences can be used to compile Unicode characters based on names.On Perl
v5.11.0
or higher, theuse feature 'unicode_strings'
is enabled.use feature fc
anduse feature unicode_eval
are enabled on Perl5.16.0
and higher.Filehandles are opened with UTF-8 encoding turned on by default (including
STDIN
,STDOUT
, andSTDERR
whenutf8::all
is used from themain
package). Meaning that they automatically convert UTF-8 octets to characters and vice versa. If you don't want UTF-8 for a particular filehandle, you'll have to setbinmode $filehandle
.@ARGV
gets converted from UTF-8 octets to Unicode characters (whenutf8::all
is used from themain
package). This is similar to the behaviour of the-CA
perl command-line switch (see perlrun).readdir
,readlink
,readpipe
(including theqx//
and backtick operators), andglob
(including the<>
operator) now all work with and return Unicode characters instead of (UTF-8) octets (again only whenutf8::all
is used from themain
package).
Lexical Scope
The pragma is lexically-scoped, so you can do the following if you had some reason to:
{ use utf8::all; open my $out, '>', 'outfile'; my $utf8_str = 'føø bār'; print length $utf8_str, "\n"; # 7 print $out $utf8_str; # out as utf8 } open my $in, '<', 'outfile'; # in as raw my $text = do { local $/; <$in>}; print length $text, "\n"; # 10, not 7!
Instead of lexical scoping, you can also use
no utf8::all
to turn off the effects.Note that the effect on
@ARGV
and theSTDIN
,STDOUT
, andSTDERR
file handles is always global and can not be undone!Enabling/Disabling Global Features
As described above, the default behaviour of
utf8::all
is to convert@ARGV
and to open theSTDIN
,STDOUT
, andSTDERR
file handles with UTF-8 encoding, and override thereadlink
andreaddir
functions andglob
operators whenutf8::all
is used from themain
package.If you want to disable these features even when
utf8::all
is used from themain
package, add the optionNO-GLOBAL
(orLEXICAL-ONLY
) to the use line. E.g.:use utf8::all 'NO-GLOBAL';
If on the other hand you want to enable these global effects even when
utf8::all
was used from another package thanmain
, use the optionGLOBAL
on the use line:use utf8::all 'GLOBAL';
UTF-8 Errors
utf8::all
will handle invalid code points (i.e., utf-8 that does not map to a valid unicode "character"), as a fatal error.For
glob
,readdir
, andreadlink
, one can change this behaviour by setting the attribute "$utf8::all::UTF8_CHECK".ATTRIBUTES
$utf8::all::UTF8_CHECK
By default
utf8::all
marks decoding errors as fatal (default value for this setting isEncode::FB_CROAK
). If you want, you can change this by setting$utf8::all::UTF8_CHECK
. The valueEncode::FB_WARN
reports the encoding errors as warnings, andEncode::FB_DEFAULT
will completely ignore them. Please see Encode for details. Note:Encode::LEAVE_SRC
is always enforced.Important: Only controls the handling of decoding errors in
glob
,readdir
, andreadlink
.INTERACTION WITH AUTODIE
If you use autodie, which is a great idea, you need to use at least version 2.12, released on June 26, 2012. Otherwise, autodie obliterates the IO layers set by the open pragma. See RT #54777 and GH #7.
BUGS
Please report any bugs or feature requests on the bugtracker website.
When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.
COMPATIBILITY
The filesystems of Dos, Windows, and OS/2 do not (fully) support UTF-8. The
readlink
andreaddir
functions andglob
operators will therefore not be replaced on these systems.SEE ALSO
File::Find::utf8 for fully utf-8 aware File::Find functions.
Cwd::utf8 for fully utf-8 aware Cwd functions.
AUTHORS
Michael Schwern <mschwern@cpan.org>
Mike Doherty <doherty@cpan.org>
Hayo Baan <info@hayobaan.com>
COPYRIGHT AND LICENSE
This software is copyright (c) 2009 by Michael Schwern <mschwern@cpan.org>; he originated it.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
Module Install Instructions
To install utf8::all, copy and paste the appropriate command in to your terminal.
cpanm utf8::all
perl -MCPAN -e shell install utf8::all
For more information on module installation, please visit the detailed CPAN module installation guide.