Hash::Normalize - Automatically normalize Unicode hash keys.
Version 0.01
use Hash::Normalize qw<normalize>; normalize my %hash, 'NFC'; $hash{café} = 'coffee'; # NFD, "cafe\x{301}" print $hash{café}; # NFD, "cafe\x{301}" # 'coffee' is printed print $hash{café}; # NFC, "caf\x{e9}" # 'coffee' is also printed
This module provides an utility routine that augments a given Perl hash table so that its keys are automatically normalized following one of the Unicode normalization schemes. All the following actions on this hash will be made regardless of how the key used for the action is normalized.
Since this module does not use the tie mechanism, normalized hashes are indistinguishable from regular hashes as far as Perl is concerned, but this module also provides "get_normalization" to identify them if necessary.
tie
normalize
normalize %hash; normalize %hash, $mode;
Applies the Unicode normalization scheme $mode onto %hash. $mode defaults to 'NFC' if omitted, and should match /^(?:(?:nf)?k?|fc)[cd]$/i otherwise.
$mode
%hash
'NFC'
/^(?:(?:nf)?k?|fc)[cd]$/i
normalize will first try to forcefully normalize the existing keys in %hash to the new mode, but it will throw an exception if there are distinct keys that have the same normalization. All the keys subsequently used for fetches, stores, exists, deletes and list assignments are then first passed through the according normalization procedure. keys %hash will also return the list of normalized keys.
keys %hash
get_normalization
my $mode = get_normalization %hash; normalize %hash, $mode;
Returns the current Unicode normalization scheme in use for %hash, or undef if it is a plain hash.
undef
Stashes (Perl symbol tables) are implemented as plain hashes, therefore one can use normalize %Pkg:: on them to make sure that Unicode symbol lookups are made regardless of normalization.
normalize %Pkg::
package Foo; BEGIN { require Hash::Normalize; # Enforce NFC normalization Hash::Normalize::normalize(%Foo::, 'NFC') } sub café { # NFD, "cafe\x{301}" return 'coffee' } sub coffee_nfc { café() # NFC, "cafe\x{e9}" } sub coffee_nfd { café() # NFD, "cafe\x{301}" } # Both coffee_nfc() and coffee_nfd() return 'coffee'
Using a normalized hash is slightly slower than a plain hash, due to the normalization procedure and the overhead of magic.
If a hash is initialized from a normalized hash by list assignment (%new = %normalized), then the normalization scheme will not be carried over to the new hash, although its keys will initially be normalized like the ones from the original hash.
%new = %normalized
The functions "normalize" and "get_normalization" are only exported on request by specifying their names in the module import list.
perl 5.10.
Carp, Exporter (core since perl 5).
Unicode::Normalize (core since perl 5.8).
Variable::Magic 0.51.
Vincent Pit, <perl at profvince.com>, http://www.profvince.com.
<perl at profvince.com>
You can contact me by mail or on irc.perl.org (vincent).
irc.perl.org
Please report any bugs or feature requests to bug-hash-normalize at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Hash-Normalize. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
bug-hash-normalize at rt.cpan.org
You can find documentation for this module with the perldoc command.
perldoc Hash::Normalize
Copyright 2017 Vincent Pit, all rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
To install Hash::Normalize, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Hash::Normalize
CPAN shell
perl -MCPAN -e shell install Hash::Normalize
For more information on module installation, please visit the detailed CPAN module installation guide.