The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Unicode::ICU::Collator - wrapper around ICU collation services

SYNOPSIS

  use Unicode::ICU::Collator;
  my $coll = Unicode::ICU::Collator->new($locale);

  # name of the locale actually selected
  print $coll->getLocale;

  # sort according to locale
  my @sorted = $coll->sort(@unsorted);

  # comparisons
  my @sorted = sort {
    $coll->cmp($a->name, $b->name)
  } @unsorted;

  # build sort keys
  my @sorted = map $_->[1],
    sort { $a->[0] cmp $b->[0] }
      map [ $coll->getSortKey($_->name), $_ ], @unsorted;

  # get the display name of a collation locale
  print Unicode::ICU::Collator->getDisplayName("de__phonebook", "en");
  # German (PHONEBOOK)
  print Unicode::ICU::Collator->getDisplayName("de__phonebook", "de");
  # Deutsch (PHONEBOOK)

DESCRIPTION

Unicode::ICU::Collator is a thin (and currently incomplete) wrapper around ICU's collation functions.

CLASS METHODS

new($locale)

Create a new collation object for the specified locale.

  my $coll = Unicode::ICU::Collator->new("en");
  my $coll_de = Unicode::ICU::Collator->new("de_phonebook");
available()

Return a list of the available collation locale names.

  my @locales = Unicode::ICU::Collator->available;
getDisplayName($locale, $display_locale)

Return a descriptive name of the locale $locale for display in locale $display_locale.

  # probably "English"
  my $en_en = Unicode::ICU::Collator->getDisplayName("en", "en");
  # "German"
  my $de_en = Unicode::ICU::Collator->getDisplayName("de", "en");
  # "Deutsch"
  my $de_de = Unicode::ICU::Collator->getDisplayName("de", "de");
  # "Deutsch (PHONEBOOK)"
  my $deph_de = Unicode::ICU::Collator->getDisplayName("de__phonebook", "de");

INSTANCE METHODS

cmp($str1, $str2)

Compare two strings per the collation selected, returning -1, 0, or 1 as per perl's cmp.

  my $cmp = $coll->cmp($str1, $str2);
  my @sorted = sort { $coll->cmp($a, $b) } @unsorted;
eq($str1, $str2)
ne($str1, $str2)
lt($str1, $str2)
gt($str1, $str2)
le($str1, $str2)
ge($str1, $str2)

Compare the strings lexically within the collation, returning true or false.

getSortKey($str)

Returns a binary string suitable for use with perl's built-in string comparison operators such as cmp, for comparing the source strings.

  my @sorted = map $_->[1],
    sort { $a->[0] cmp $b->[0] }
      map [ $coll->getSortKey($_->name), $_ ], @unsorted;
sort(@list)

Return the contents of @list (which can be any list, not just an array) sorted per the collation.

Currently this is a simply perl code wrapper around getSortKey() but that may change.

  my @sorted = $coll->sort(@unsorted);
getLocale()
getLocale($type)

Return the locale used as the source of the collation, the most specific collation name known or the collation name supplied to new, depending on $type.

$type is one of the following constants, as exported by the :locale export tag:

  • ULOC_ACTUAL_LOCALE - the actual locale being used. eg. if you supply "en_US" to new, this will probably return "en". If $type is not provided, this is the default.

  • ULOC_VALID_LOCALE - the most specific locale supported by ICU.

  my $name = $coll->getLocale();
  use Unicode::ICU::Collator ':locale';
  my $name = $coll->getLocale(ULOC_VALID_LOCALE());

Previously you could supply ULOC_REQUESTED_LOCALE to get the locale name supplied to new(), but this was deprecated in ICU and current versions of ICU return an error, so I've removed it.

setAttribute($attr, $value)

Set an attribute for the collation.

Constants for $attr and $value are exported by the :attributes tag.

Please see the documentation of UColAttribute type in the ICU documentation for details.

  $coll->setAttribute(UCOL_NUMERIC_COLLATION(), UCOL_ON());
getAttribute($attr)

Return the value of a collation attribute.

  my $value = $coll->getAttribute(UCOL_NUMERIC_COLLATION());
getRules()
getRules($type)

Retrieve the collation rules used by this collator.

Note: this is typically a long string for UCOL_FULL_RULES, and probably isn't very useful.

Values for $type are:

  • UCOL_FULL_RULES - the full set of rules for the collation. This is the default.

  • UCOL_TAILORING_ONLY - only the rule tailoring.

getVersion()

Return version information for the collator as a dotted decimal string.

getUCAVersion()

Return the UCA version information for a collator.

LICENSE

Unicode::ICU::Collator is licensed under the same terms as Perl itself.

SEE ALSO

http://site.icu-project.org/

http://userguide.icu-project.org/collation

http://icu-project.org/apiref/icu4c/ucol_8h.html

Unicode::Collate

AUTHOR

Tony Cook <tonyc@cpan.org>