Unicode::ICU::Collator - wrapper around ICU collation services
use Unicode::ICU::Collator; my $coll = Unicode::ICU::Collator->new($locale); # name of the locale actually selected print $coll->getLocale; # sort according to locale my @sorted = $coll->sort(@unsorted); # comparisons my @sorted = sort { $coll->cmp($a->name, $b->name) } @unsorted; # build sort keys my @sorted = map $_->[1], sort { $a->[0] cmp $b->[0] } map [ $coll->getSortKey($_->name), $_ ], @unsorted; # get the display name of a collation locale print Unicode::ICU::Collator->getDisplayName("de__phonebook", "en"); # German (PHONEBOOK) print Unicode::ICU::Collator->getDisplayName("de__phonebook", "de"); # Deutsch (PHONEBOOK)
Unicode::ICU::Collator is a thin (and currently incomplete) wrapper around ICU's collation functions.
Create a new collation object for the specified locale.
my $coll = Unicode::ICU::Collator->new("en"); my $coll_de = Unicode::ICU::Collator->new("de_phonebook");
Return a list of the available collation locale names.
my @locales = Unicode::ICU::Collator->available;
Return a descriptive name of the locale $locale for display in locale $display_locale.
$locale
$display_locale
# probably "English" my $en_en = Unicode::ICU::Collator->getDisplayName("en", "en"); # "German" my $de_en = Unicode::ICU::Collator->getDisplayName("de", "en"); # "Deutsch" my $de_de = Unicode::ICU::Collator->getDisplayName("de", "de"); # "Deutsch (PHONEBOOK)" my $deph_de = Unicode::ICU::Collator->getDisplayName("de__phonebook", "de");
Compare two strings per the collation selected, returning -1, 0, or 1 as per perl's cmp.
cmp
my $cmp = $coll->cmp($str1, $str2); my @sorted = sort { $coll->cmp($a, $b) } @unsorted;
Compare the strings lexically within the collation, returning true or false.
Returns a binary string suitable for use with perl's built-in string comparison operators such as cmp, for comparing the source strings.
my @sorted = map $_->[1], sort { $a->[0] cmp $b->[0] } map [ $coll->getSortKey($_->name), $_ ], @unsorted;
Return the contents of @list (which can be any list, not just an array) sorted per the collation.
@list
Currently this is a simply perl code wrapper around getSortKey() but that may change.
getSortKey()
my @sorted = $coll->sort(@unsorted);
Return the locale used as the source of the collation, the most specific collation name known or the collation name supplied to new, depending on $type.
$type
$type is one of the following constants, as exported by the :locale export tag:
:locale
ULOC_ACTUAL_LOCALE - the actual locale being used. eg. if you supply "en_US" to new, this will probably return "en". If $type is not provided, this is the default.
"en_US"
"en"
ULOC_VALID_LOCALE - the most specific locale supported by ICU.
my $name = $coll->getLocale(); use Unicode::ICU::Collator ':locale'; my $name = $coll->getLocale(ULOC_VALID_LOCALE());
Previously you could supply ULOC_REQUESTED_LOCALE to get the locale name supplied to new(), but this was deprecated in ICU and current versions of ICU return an error, so I've removed it.
ULOC_REQUESTED_LOCALE
new()
Set an attribute for the collation.
Constants for $attr and $value are exported by the :attributes tag.
$attr
$value
:attributes
Please see the documentation of UColAttribute type in the ICU documentation for details.
UColAttribute
$coll->setAttribute(UCOL_NUMERIC_COLLATION(), UCOL_ON());
Return the value of a collation attribute.
my $value = $coll->getAttribute(UCOL_NUMERIC_COLLATION());
Retrieve the collation rules used by this collator.
Note: this is typically a long string for UCOL_FULL_RULES, and probably isn't very useful.
UCOL_FULL_RULES
Values for $type are:
UCOL_FULL_RULES - the full set of rules for the collation. This is the default.
UCOL_TAILORING_ONLY - only the rule tailoring.
Return version information for the collator as a dotted decimal string.
Return the UCA version information for a collator.
Unicode::ICU::Collator is licensed under the same terms as Perl itself.
http://site.icu-project.org/
http://userguide.icu-project.org/collation
http://icu-project.org/apiref/icu4c/ucol_8h.html
Unicode::Collate
Tony Cook <tonyc@cpan.org>
To install Unicode::ICU::Collator, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Unicode::ICU::Collator
CPAN shell
perl -MCPAN -e shell install Unicode::ICU::Collator
For more information on module installation, please visit the detailed CPAN module installation guide.