The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Locale::CLDR - A Module to create locale objects with localisation data from the CLDR

VERSION

Version 0.25.1

SYNOPSIS

This module provides a locale object you can use to localise your output. The localisation data comes from the Unicode Common Locale Data Repository. Most of this code can be used with Perl version 5.10 or above. There are a few parts of the code that require version 5.18 or above.

USAGE

 my $locale = Locale::CLDR->new('en_GB');

or

 my $locale = Locale::CLDR->new(language_id => 'en', territory_id => 'gb');
 

A full locale identifier is

language_script_territory_variant_u_extension name_extension value

 my $locale = Locale::CLDR->new('en_latn_GB_SCOUSE_u_nu_traditional');
 

or

 my $locale = Locale::CLDR->new(language_id => 'en', script_id => 'latn', territory_id => 'gb', variant => 'SCOUSE', extensions => { nu => 'traditional' } );
 

ATTRIBUTES

These can be passed into the constructor and all are optional.

language_id

A valid language or language alias id, such as en

script_id

A valid script id, such as latn or Ctcl. The code will pick a likely script depending on the given language if non is provided.

territory_id

A valid territory id or territory alias such as GB

variant_id

A valid variant id. The code currently ignores this

extensions

A Hashref of extension names and values. You can use this to override the locales number formatting and calendar by passing in the Unicode extension names or aliases as keys and the extension value as the hash value.

Currently supported extensions are

nu
numbers

The number type can be one of

arab

Arabic-Indic Digits

arabext

Extended Arabic-Indic Digits

armn

Armenian Numerals

armnlow

Armenian Lowercase Numerals

bali

Balinese Digits

beng

Bengali Digits

brah

Brahmi Digits

cakm

Chakma Digits

cham

Cham Digits

deva

Devanagari Digits

ethi

Ethiopic Numerals

finance

Financial Numerals

fullwide

Full Width Digits

geor

Georgian Numerals

grek

Greek Numerals

greklow

Greek Lowercase Numerals

gujr

Gujarati Digits

guru

Gurmukhi Digits

hanidays

Chinese Calendar Day-of-Month Numerals

hanidec

Chinese Decimal Numerals

hans

Simplified Chinese Numerals

hansfin

Simplified Chinese Financial Numerals

hant

Traditional Chinese Numerals

hantfin

Traditional Chinese Financial Numerals

hebr

Hebrew Numerals

java

Javanese Digits

jpan

Japanese Numerals

jpanfin

Japanese Financial Numerals

kali

Kayah Li Digits

khmr

Khmer Digits

knda

Kannada Digits

lana

Tai Tham Hora Digits

lanatham

Tai Tham Tham Digits

laoo

Lao Digits

latn

Western Digits

lepc

Lepcha Digits

limb

Limbu Digits

mlym

Malayalam Digits

mong

Mongolian Digits

mtei

Meetei Mayek Digits

mymr

Myanmar Digits

mymrshan

Myanmar Shan Digits

native

Native Digits

nkoo

N'Ko Digits

olck

Ol Chiki Digits

orya

Oriya Digits

osma

Osmanya Digits

roman

Roman Numerals

romanlow

Roman Lowercase Numerals

saur

Saurashtra Digits

shrd

Sharada Digits

sora

Sora Sompeng Digits

sund

Sundanese Digits

takr

Takri Digits

talu

New Tai Lue Digits

taml

Traditional Tamil Numerals

tamldec

Tamil Digits

telu

Telugu Digits

thai

Thai Digits

tibt

Tibetan Digits

traditional

Traditional Numerals

vaii

Vai Digits

Note that the code currently only handles digits ie locales with characters corresponding to 0 - 9, if you use a numerical number type it will fall back to latn. Later versions are planned to handle numerals correctly.

ca
calendar

You can use this to override a locales default calendar. Valid values are

buddhist

Buddhist Calendar

chinese

Chinese Calendar

coptic

Coptic Calendar

dangi

Dangi Calendar

ethiopic

Ethiopic Calendar

ethiopic-amete-alem

Ethiopic Amete Alem Calendar

gregorian

Gregorian Calendar

hebrew

Hebrew Calendar

indian

Indian National Calendar

islamic

Islamic Calendar

islamic-civil

Islamic Calendar (tabular, civil epoch)

islamic-rgsa

Islamic Calendar (Saudi Arabia, sighting)

islamic-tbla

Islamic Calendar (tabular, astronomical epoch)

islamic-umalqura

Islamic Calendar (Umm al-Qura)

iso8601

ISO-8601 Calendar

japanese

Japanese Calendar

persian

Persian Calendar

roc

Minguo Calendar

Methods

The following methods can be called on the locale object

id()

The local identifier. This is what you get if you attempt to stringify a locale object.

likely_language()

Given a locale with no language passed in or with the explicit language code of und, this method attempts to use the script and territory data to guess the locales language.

likely_script()

Given a locale with no script passed in this method attempts to use the language and territory data to guess the locales script.

likely_territory()

Given a locale with no territory passed in this method attempts to use the language and script data to guess the locales territory.

Meta Data

The following methods return, in English, the names if the various id's passed into the locales constructor. I.e. if you passed language => 'fr' to the constructor you would get back French for the language.

name

The locales name. This is usually built up out of the language, script, territory and variant of the locale

language

The name of the locales language

script

The name of the locales script

territory

The name of the locales territory

variant

The name of the locales variant

Native Meta Data

Like Meta Data above this provides the names of the various id's passed into the locales constructor. However in this case the names are formatted to match the locale. I.e. if you passed language => 'fr' to the constructor you would get back français for the language.

native_name

The locales name. This is usually built up out of the language, script, territory and variant of the locale. Returned in the locales language and script

native_language

The name of the locales language in the locales language and script.

native_script

The name of the locales script in the locales language and script.

native_territory

The name of the locales territory in the locales language and script.

native_variant

The name of the locales variant in the locales language and script.

Calenders

The Calendar data is built to hook into DateTime::Locale so that all Locale::CLDR objects can be used as replacements for DateTime::Locale's locale data. To use, say, the French data do

 my $french_locale = Locale::CLDR->new('fr_FR');
 my $french_dt = DateTime->now(locale => $french_locale);
 say "French month : ", $french_dt->month_name; # prints out the current month in French
month_format_wide
month_format_abbreviated
month_format_narrow
month_stand_alone_wide
month_stand_alone_abbreviated
month_stand_alone_narrow

All the above return an arrayref of month names in the requested style.

day_format_wide
day_format_abbreviated
day_format_narrow
day_stand_alone_wide
day_stand_alone_abbreviated
day_stand_alone_narrow

All the above return an array ref of day names in the requested style.

quarter_format_wide
quarter_format_abbreviated
quarter_format_narrow
quarter_stand_alone_wide
quarter_stand_alone_abbreviated
quarter_stand_alone_narrow

All the above return an arrayref of quarter names in the requested style.

am_pm_wide
am_pm_abbreviated
am_pm_narrow

All the above return the date period name for AM and PM in the requested style

era_wide
era_abbreviated
era_narrow

All the above return an array ref of era names. Note that these return the first two eras which is what you normally want for BC and AD etc. but won't work correctly for Japanese calendars.

The next set of methods are not used by DateTime::Locale but CLDR provide the data and you might want it

am_pm_format_wide
am_pm_format_abbreviated
am_pm_format_narrow
am_pm_stand_alone_wide
am_pm_stand_alone_abbreviated
am_pm_stand_alone_narrow

All the above return a hashref keyed on date period with the value being the value for that date period

era_format_wide
era_format_abbreviated
era_format_narrow
era_stand_alone_wide
era_stand_alone_abbreviated
era_stand_alone_narrow

All the above return an array ref with all the era data for the locale formatted to the requested width

date_format_full
date_format_long
date_format_medium
date_format_short
time_format_full
time_format_long
time_format_medium
time_format_short
datetime_format_full
datetime_format_long
datetime_format_medium
datetime_format_short

All the above return the CLDR date format pattern for the given element and width

prefers_24_hour_time()

Returns a boolean value, true if the locale has a preference for 24 hour time over 12 hour

first_day_of_week()

Returns the numeric representation of the first day of the week With 0 = Saturday

Names

These methods allow you to pass in a locale, either by id or as a Locale::CLDR object and return an name formatted in the locale of $self. If you don't pass in a locale then it will use $self.

locale_name($name)

Returns the given locale name in the current locales format. The name can be a locale id or a locale object or non existent. If a name is not passed in then the name of the current locale is returned.

language_name($language)

Returns the language name in the current locales format. The name can be a locale language id or a locale object or non existent. If a name is not passed in then the language name of the current locale is returned.

all_languages()

Returns a hash ref keyed on language id of all the languages the system knows about. The values are the language names for the corresponding id's

script_name($script)

Returns the script name in the current locales format. The script can be a locale script id or a locale object or non existent. If a script is not passed in then the script name of the current locale is returned.

all_scripts()

Returns a hash ref keyed on script id of all the scripts the system knows about. The values are the script names for the corresponding id's

territory_name($territory)

Returns the territory name in the current locales format. The territory can be a locale territory id or a locale object or non existent. If a territory is not passed in then the territory name of the current locale is returned.

all_territories

Returns a hash ref keyed on territory id of all the territory the system knows about. The values are the territory names for the corresponding id's

variant_name($variant)

Returns the variant name in the current locales format. The variant can be a locale variant id or a locale object or non existent. If a variant is not passed in then the variant name of the current locale is returned.

key_name($key)

Returns the key name in the current locales format. The key must be a locale key id as a string

type_name($key, $type)

Returns the type name in the current locales format. The key and type must be a locale key id and type id as a string

measurement_system_name($measurement_system)

Returns the measurement system name in the current locales format. The measurement system must be a measurement system id as a string

transform_name($name)

Returns the transform (transliteration) name in the current locales format. The transform must be a transform id as a string

code_pattern($type, $locale)

This method formats a language, script or territory name, given as $type from $locale in a way expected by the current locale. If $locale is not passed in or is undef() the method uses the current locale.

text_orientation($type)

Gets the text orientation for the locale. Type must be one of lines or characters

Segmentation

This group of methods allow you to split a string in various ways Note you need Perl 5.18 or above for this

split_grapheme_clusters($string)

Splits a string on grapheme clusters using the locals segmentation rules. Returns a list of grapheme clusters.

split_words($string)

Splits a string on word boundaries using the locals segmentation rules. Returns a list of words.

split_sentences($string)

Splits a string on on all points where a sentence could end using the locals segmentation rules. Returns a list the end of each list element is the point where a sentence could end.

split_lines($string)

Splits a string on on all points where a line could end using the locals segmentation rules. Returns a list the end of each list element is the point where a line could end.

Characters

is_exemplar_character( $type, $character)
is_exemplar_character($character)

Tests if the given character is used in the locale. There are three possible types; c<main>, auxiliary and c<punctuation>. If no type is given main is assumed. Unless the index type is given you will have to have a Perl version of 5.18 or above to use this method

index_characters()

Returns an array ref of characters normally used when creating an index.

Truncation

These methods format a string to show where part of the string has been removed

truncated_beginning($string)

Adds the locale specific marking to show that the string has been truncated at the beginning.

truncated_between($string, $string)

Adds the locale specific marking to show that something has been truncated between the two strings. Returns a string comprising of the concatenation of the first string, the mark and the second string

truncated_end($string)

Adds the locale specific marking to show that the string has been truncated at the end.

truncated_word_beginning($string)

Adds the locale specific marking to show that the string has been truncated at the beginning. This should be used in preference to truncated_beginning when the truncation occurs on a word boundary.

truncated_word_between($string, $string)

Adds the locale specific marking to show that something has been truncated between the two strings. Returns a string comprising of the concatenation of the first string, the mark and the second string. This should be used in preference to truncated_between when the truncation occurs on a word boundary.

truncated_word_end($string)

Adds the locale specific marking to show that the string has been truncated at the end. This should be used in preference to truncated_end when the truncation occurs on a word boundary.

Quoting

quote($string)

Adds the locales primary quotation marks to the ends of the string. Also scans the string for paired primary and auxiliary quotation marks and flips them.

eg passing z “abc” z to this method for the en_GB locale gives “z ‘abc’ z”

Miscellaneous

more_information()

The more information string is one that can be displayed in an interface to indicate that more information is available.

measurement()

Returns the measurement type for the locale

paper()

Returns the paper type for the locale

Units

all_units()

Returns a list of all the unit identifiers for the locale

unit($number, $unit, $width)

Returns the localised string for the given number and unit formatted for the required width. The number must not be the localized version of the number. The returned string will be in the locales format, including the number.

duration_unit($format, @data)

This method formats a duration. The format must be one of hm, hms or ms corresponding to hour minute, hour minute second and minute second respectively. The data must correspond to the given format.

Yes or No?

is_yes($string)

Returns true if the passed in string matches the locales idea of a string designating yes. Note that under POSIX rules unless the locales word for yes starts with Y (U+0079) then a single 'y' will also be accepted as yes. The string will be matched case insensitive.

is_no($string)

Returns true if the passed in string matches the locales idea of a string designating no. Note that under POSIX rules unless the locales word for yes starts with n (U+006E) then a single 'n' will also be accepted as no The string will be matched case insensitive.

Transliteration

This method requires Perl version 5.18 or above to use

transform(from => $from, to => $to, variant => $variant, text => $text)

This method returns the transliterated string of text from script from to script to using variant variant. If c<from> is not given then the current locales script is used. If text is not given then it defaults to an empty string. The variant is optional.

Lists

list(@data)

Returns data as a string formatted by the locales idea of producing a list of elements. What is returned can be effected by the locale and the number of items in data. Note that data can contain 0 or more items.

plural($number)

This method takes a number and uses the locales pluralisation rules to calculate the type of pluralisation required for units, currencies and other data that changes depending on the plural state of the number

plural_range($start, $end)

This method returns the plural type for the range $start to $end $start and $end can either be numbers or one of the plural types zero one two few many other

get_day_period($time)

This method will calculate the correct period for a given time and return the period name in the Locales language and script

format_for

TODO fix this

Valid codes

valid_languages()

This method returns a list containing all the valid language codes

valid_scripts()

This method returns a list containing all the valid script codes

valid_territories()

This method returns a list containing all the valid territory codes

valid_variants()

This method returns a list containing all the valid variant codes

key_aliases()

This method returns a hash that maps valid keys to their valid aliases

key_names()

This method returns a hash that maps valid key aliases to their valid keys

valid_keys()

This method returns a hash of valid keys and the valid type codes you can have with each key

language_aliases()

This method returns a hash that maps valid language codes to their valid aliases

territory_aliases()

This method returns a hash that maps valid territory codes to their valid aliases

variant_aliases()

This method returns a hash that maps valid variant codes to their valid aliases

Information about weeks

week_data_min_days($territory_id)

This method takes an optional territory id and returns a the minimum number of days a week must have to count as the starting week of the new year. It uses the current locales territory if no territory id is passed in.

week_data_first_day($territory_id)

This method takes an optional territory id and returns the three letter code of the first day of the week for that territory. If no territory id is passed in then it uses the current locales territory.

week_data_weekend_start()

This method takes an optional territory id and returns the three letter code of the first day of the week end for that territory. If no territory id is passed in then it uses the current locales territory.

week_data_weekend_end()

This method takes an optional territory id and returns the three letter code of the first day of the week end for that territory. If no territory id is passed in then it uses the current locales territory.

Territory Containment

territory_contains()

This method returns a hash ref keyed on territory id. The value is an array ref Each element of the array ref is a territory id of a territory immediately contained in the territory used as the key

territory_contained_by()

This method returns a hash ref keyed on territory id. The value of the hash is the territory id of the immediately containing territory.

Numbering Systems

numbering_system()

This method returns a hash ref keyed on numbering system id which, for a given locale, can be got by calling the default_numbering_system() method. The values of the hash are a two element hash ref the keys being type and data. If the type is numeric then the data is an array ref of characters. The position in the array matches the numeric value of the character. If the type is algorithmic then data is the name of the algorithm used to display numbers in that format.

Number Formatting

format_number($number, $format, $currency, $for_cash)

This method formats the number $number using the format $format. If the format contains the currency symbol ¤ then the currency symbol for the currency code in $currency will be used. If $currency is undef() then the default currency code for the locale will be used.

Note that currency codes are based on territory so if you do not pass in a currency and your locale did not get passed a territory in the constructor you are going to end up with the likely sub tag's idea of the currency. This functionality may be removed or at least changed to emit a warning in future releases.

$for_cash is only used during currency formatting. If true then cash rounding will be used otherwise financial rounding will be used.

add_currency_symbol($format, $symbol)

This method returns the format with the currency symbol $symbol correctly inserted into the format

parse_number_format($format, $currency, $currency_data, $for_cash)

This method parses a CLDR format string into a hash ref containing data used to format a number. If a currency is being formatted then $currency contains the currency code, $currency_data is a hashref containing the currency rounding information and $for_cash is a flag to signal cash or financial rounding.

This should probably be a private function.

round($number, $increment, $decimal_digits)

This method returns $number rounded to the nearest $increment with $decimal_digits digits after the decimal point

get_formatted_number($number, $format, $currency_data, $for_cash)

This method takes the $format produced by parse_number_format() and uses it to parse $number. It returns a string containing the parsed number. If a currency is being formatted then $currency_data is a hashref containing the currency rounding information and $for_cash is a flag to signal cash or financial rounding.

get_digits()

This method returns an array containing the digits used by the locale, The order of the array is the order of the digits. It the locale's numbering system is algorithmic it will return [0,1,2,3,4,5,6,7,8,9]

default_numbering_system()

This method returns the numbering system id for the locale.

Measurement Information

measurement_system()

This method returns a hash ref keyed on territory, the value being the measurement system id for the territory. If the territory you are interested in is not listed use the territory_contained_by() method until you find an entry.

paper_size()

This method returns a hash ref keyed on territory, the value being the paper size used in that territory. If the territory you are interested in is not listed use the territory_contained_by() method until you find an entry.

Likely Tags

likely_subtags()

A full locale tag requires, as a minimum, a language, script and territory code. However for some locales it is possible to infer the missing element if the other two are given, e.g. given en_GB you can infer the script will be latn. It is also possible to fill in the missing elements of a locale with sensible defaults given sufficient knowledge of the layout of the CLDR data and usage patterns of locales around the world.

This function returns a hash ref keyed on partial locale id's with the value being the locale id for the most likely language, script and territory code for the key.

Currency Information

currency_fractions()

This method returns a hash ref keyed on currency id. The value is a hash ref containing four keys. The keys are

digits

The number of decimal digits normally formatted.

rounding

The rounding increment, in units of 10^-digits.

cashdigits

The number of decimal digits to be used when formatting quantities used in cash transactions (as opposed to a quantity that would appear in a more formal setting, such as on a bank statement).

cashrounding

The cash rounding increment, in units of 10^-cashdigits.

default_currency($territory_id)

This method returns the default currency id for the territory id. If no territory id is given then the current locales is used

currency_symbol($currency_id)

This method returns the currency symbol for the given currency id in the current locale. If no currency id is given it uses the locales default currency

Calendar Information

calendar_preferences()

This method returns a hash ref keyed on territory id. The values are array refs containing the preferred calendar id's in order of preference.

default_calendar($territory)

This method returns the default calendar id for the given territory. If no territory id given it used the territory of the current locale.

AUTHOR

John Imrie, <john dot imrie1 at gmail dot com>

BUGS

Please report any bugs or feature requests to bug-locale-cldr at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Locale-CLDR. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Locale::CLDR

You can also look for information at:

ACKNOWLEDGEMENTS

Everyone at the Unicode Consortium for providing the data.

Karl Williams for his tireless work on Unicode in the Perl regex engine.

Andrew Rodland for his Unicode::CaseFold module that I pinched the fc() code from for early versions of Perl that don't have this function.

COPYRIGHT & LICENSE

Copyright 2009-2014 John Imrie. Backwards compatible Case Folding Copyright Andrew Rodland ARODLAND@cpan.org

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.