Locale::CLDR - A Module to create locale objects with localisation data from the CLDR
Version 0.44.1
This module provides a locale object you can use to localise your output. The localisation data comes from the Unicode Common Locale Data Repository. Most of this code can be used with Perl version 5.10.1 or above. There are a few parts of the code that require version 5.18 or above.
my $locale = Locale::CLDR->new('en_US');
or
my $locale = Locale::CLDR->new(language_id => 'en', region_id => 'us');
A full locale identifier is
language_script_region_variant_u_extension name_extension value
language
script
region
variant
extension name
extension value
my $locale = Locale::CLDR->new('en_latn_US_SCOUSE_u_nu_traditional');
my $locale = Locale::CLDR->new(language_id => 'en', script_id => 'latn', region_id => 'US', variant => 'SCOUSE', extensions => { nu => 'traditional' } );
These can be passed into the constructor and all are optional.
A valid language or language alias id, such as en
en
A valid script id, such as latn or Ctcl. The code will pick a likely script depending on the given language if non is provided.
latn
Ctcl
A valid region id or region alias such as GB
GB
A valid variant id. The code currently ignores this
A Hashref of extension names and values. You can use this to override the locales number formatting and calendar by passing in the Unicode extension names or aliases as keys and the extension value as the hash value.
Currently supported extensions are
You can use this to override a locales default calendar. Valid values are
Thai Buddhist calendar
Traditional Chinese calendar
Coptic calendar
Traditional Korean calendar
Ethiopic calendar, Amete Alem (epoch approx. 5493 B.C.E)
Ethiopic calendar, Amete Mihret (epoch approx, 8 C.E.)
Gregorian calendar
Hebrew Calendar
Indian National Calendar
Islamic Calendar
Islamic Calendar (tabular, civil epoch)
Islamic Calendar (Saudi Arabia, sighting)
Islamic Calendar (tabular, astronomical epoch)
Islamic Calendar (Umm al-Qura)
ISO-8601 Calendar
Japanese Calendar
Persian Calendar
Minguo Calendar
This overrides the default currency format. It can be set to one of standard or account
standard
account
The default collation order. Two collation orders are universal
The standard collation order for the local
A collation type just used for comparing two strings to see if they match
There are other collation keywords but they are dependant on the local being used see Unicode Collation Identifier
This extension overrides the default currency symbol for the locale. It's value is any valid currency identifyer.
Dictionary break script exclusions: specifies scripts to be excluded from dictionary-based text break (for words and lines).
Emoji presentation style, can be one of
Use an emoji presentation for emoji characters if possible.
Use a text presentation for emoji characters if possible.
Use the default presentation for emoji characters as specified in UTR #51 Section 4, Presentation Style.
This extension overrides the first day of the week. It can be set to one of
A Unicode Hour Cycle Identifier defines the preferred time cycle. Can be one of
Hour system using 1–12; corresponds to 'h' in patterns
Hour system using 0–23; corresponds to 'H' in patterns
Hour system using 0–11; corresponds to 'K' in patterns
Hour system using 1–24; corresponds to 'k' in patterns
A Unicode Line Break Style Identifier defines a preferred line break style corresponding to the CSS level 3 line-break option. Can be one of
CSS level 3 line-break=strict, e.g. treat CJ as NS
CSS level 3 line-break=normal, e.g. treat CJ as ID, break before hyphens for ja,zh
CSS level 3 line-break=loose
A Unicode Line Break Word Identifier defines preferred line break word handling behavior corresponding to the CSS level 3 word-break option. Can be one of
CSS level 3 word-break=normal, normal script/language behavior for midword breaks
CSS level 3 word-break=break-all, allow midword breaks unless forbidden by lb setting
CSS level 3 word-break=keep-all, prohibit midword breaks except for dictionary breaks
Prioritize keeping natural phrases (of multiple words) together when breaking, used in short text like title and headline
Measurement system. Can be one of
Metric System
US System of measurement: feet, pints, etc.; pints are 16oz
UK System of measurement: feet, pints, etc.; pints are 20oz
The number type can be one of
Arabic-Indic Digits
Extended Arabic-Indic Digits
Armenian Numerals
Armenian Lowercase Numerals
Balinese Digits
Bengali Digits
Brahmi Digits
Chakma Digits
Cham Digits
Devanagari Digits
Ethiopic Numerals
Financial Numerals
Full Width Digits
Georgian Numerals
Greek Numerals
Greek Lowercase Numerals
Gujarati Digits
Gurmukhi Digits
Chinese Calendar Day-of-Month Numerals
Chinese Decimal Numerals
Simplified Chinese Numerals
Simplified Chinese Financial Numerals
Traditional Chinese Numerals
Traditional Chinese Financial Numerals
Hebrew Numerals
Javanese Digits
Japanese Numerals
Japanese Financial Numerals
Kayah Li Digits
Khmer Digits
Kannada Digits
Tai Tham Hora Digits
Tai Tham Tham Digits
Lao Digits
Western Digits
Lepcha Digits
Limbu Digits
Malayalam Digits
Mongolian Digits
Meetei Mayek Digits
Myanmar Digits
Myanmar Shan Digits
Native Digits
N'Ko Digits
Ol Chiki Digits
Oriya Digits
Osmanya Digits
Roman Numerals
Roman Lowercase Numerals
Saurashtra Digits
Sharada Digits
Sora Sompeng Digits
Sundanese Digits
Takri Digits
New Tai Lue Digits
Traditional Tamil Numerals
Tamil Digits
Telugu Digits
Thai Digits
Tibetan Digits
Traditional Numerals
Vai Digits
Region Override
Regional Subdivision
Sentence break suppressions. Can be one of
Don’t use sentence break suppressions data (the default).
Use sentence break suppressions data of type "standard"
Time zone
Common variant type
The following methods can be called on the locale object
Returns an array ref containing the sorted list of installed locale identfiers
The local identifier. This is what you get if you attempt to stringify a locale object.
True if a region id was passed into the constructor
True if a script id was passed into the constructor
True if a variant id was passed into the constructor
Given a locale with no language passed in or with the explicit language code of und, this method attempts to use the script and region data to guess the locale's language.
und
Given a locale with no script passed in this method attempts to use the language and region data to guess the locale's script.
Given a locale with no region passed in this method attempts to use the language and script data to guess the locale's region.
The following methods return, in English, the names if the various id's passed into the locales constructor. I.e. if you passed language => 'fr' to the constructor you would get back French for the language.
language => 'fr'
French
The locale's name. This is usually built up out of the language, script, region and variant of the locale
The name of the locale's language
The name of the locale's script
The name of the locale's region
The name of the locale's variant
Like Meta Data above this provides the names of the various id's passed into the locale's constructor. However in this case the names are formatted to match the locale. I.e. if you passed language => 'fr' to the constructor you would get back français for the language.
français
The locale's name. This is usually built up out of the language, script, region and variant of the locale. Returned in the locale's language and script
The name of the locale's language in the locale's language and script.
The name of the locale's script in the locale's language and script.
The name of the locale's region in the locale's language and script.
The name of the locale's variant in the locale's language and script.
The Calendar data is built to hook into DateTime::Locale so that all Locale::CLDR objects can be used as replacements for DateTime::Locale's locale data. To use, say, the French data do
my $french_locale = Locale::CLDR->new('fr_FR'); my $french_dt = DateTime->now(locale => $french_locale); say "French month : ", $french_dt->month_name; # prints out the current month in French
All the above return an arrayref of month names in the requested style.
All the above return an array ref of day names in the requested style.
All the above return an arrayref of quarter names in the requested style.
All the above return the date period name for AM and PM in the requested style
All the above return an array ref of era names. Note that these return the first two eras which is what you normally want for BC and AD etc. but won't work correctly for Japanese calendars.
The next set of methods are not used by DateTime::Locale but CLDR provide the data and you might want it
All the above return a hashref keyed on date period with the value being the value for that date period
All the above return an array ref with all the era data for the locale formatted to the requested width
All the above return the CLDR date format pattern for the given element and width
Returns a boolean value, true if the locale has a preference for 24 hour time over 12 hour
Returns the numeric representation of the first day of the week With 0 = Saturday
This method will calculate the correct period for a given time and return the period name in the locale's language and script
This method takes a CLDR date time format and returns the localised version of the format.
These methods allow you to pass in a locale, either by id or as a Locale::CLDR object and return an name formatted in the locale of $self. If you don't pass in a locale then it will use $self.
id
Returns the given locale name in the current locale's format. The name can be a locale id or a locale object or non existent. If a name is not passed in then the name of the current locale is returned.
Returns the language name in the current locale's format. The name can be a locale language id or a locale object or non existent. If a name is not passed in then the language name of the current locale is returned.
Returns a hash ref keyed on language id of all the languages the system knows about. The values are the language names for the corresponding id's
Returns the script name in the current locale's format. The script can be a locale script id or a locale object or non existent. If a script is not passed in then the script name of the current locale is returned.
Returns a hash ref keyed on script id of all the scripts the system knows about. The values are the script names for the corresponding id's
Returns the region name in the current locale's format. The region can be a locale region id or a locale object or non existent. If a region is not passed in then the region name of the current locale is returned.
Returns a hash ref keyed on region id of all the region the system knows about. The values are the region names for the corresponding ids
Returns the variant name in the current locale's format. The variant can be a locale variant id or a locale object or non existent. If a variant is not passed in then the variant name of the current locale is returned.
Returns the key name in the current locale's format. The key must be a locale key id as a string
Returns the type name in the current locale's format. The key and type must be a locale key id and type id as a string
Returns the measurement system name in the current locale's format. The measurement system must be a measurement system id as a string
Returns the transform (transliteration) name in the current locale's format. The transform must be a transform id as a string
This method formats a language, script or region name, given as $type from $locale in a way expected by the current locale. If $locale is not passed in or is undef() the method uses the current locale.
$type
$locale
Gets the text orientation for the locale. Type must be one of lines or characters
lines
characters
This group of methods allow you to split a string in various ways Note you need Perl 5.18 or above for this
Splits a string on grapheme clusters using the locale's segmentation rules. Returns a list of grapheme clusters.
Splits a string on word boundaries using the locale's segmentation rules. Returns a list of words.
Splits a string on on all points where a sentence could end using the locale's segmentation rules. Returns a list the end of each list element is the point where a sentence could end.
Splits a string on on all points where a line could end using the locale's segmentation rules. Returns a list the end of each list element is the point where a line could end.
Tests if the given character is used in the locale. There are four possible types; main, auxiliary, punctuation and index. If no type is given main is assumed. Unless the index type is given you will have to have a Perl version of 5.18 or above to use this method
main
auxiliary
punctuation
index
Returns an array ref of characters normally used when creating an index and ordered appropriately.
These methods format a string to show where part of the string has been removed
Adds the locale specific marking to show that the string has been truncated at the beginning.
Adds the locale specific marking to show that something has been truncated between the two strings. Returns a string comprising of the concatenation of the first string, the mark and the second string
Adds the locale specific marking to show that the string has been truncated at the end.
Adds the locale specific marking to show that the string has been truncated at the beginning. This should be used in preference to truncated_beginning when the truncation occurs on a word boundary.
truncated_beginning
Adds the locale specific marking to show that something has been truncated between the two strings. Returns a string comprising of the concatenation of the first string, the mark and the second string. This should be used in preference to truncated_between when the truncation occurs on a word boundary.
truncated_between
Adds the locale specific marking to show that the string has been truncated at the end. This should be used in preference to truncated_end when the truncation occurs on a word boundary.
truncated_end
Adds the locale's primary quotation marks to the ends of the string. Also scans the string for paired primary and auxiliary quotation marks and flips them.
eg passing z “abc” z to this method for the en_GB locale gives “z ‘abc’ z”
z “abc” z
en_GB
“z ‘abc’ z”
The more information string is one that can be displayed in an interface to indicate that more information is available.
Returns the measurement type for the locale
Returns the paper type for the locale
Returns a list of all the unit identifiers for the locale
Returns the localised string for the given number and unit formatted for the required width. The number must not be the localized version of the number. The returned string will be in the locale's format, including the number.
This method returns the localised name of the unit
This method formats a duration. The format must be one of hm, hms or ms corresponding to hour minute, hour minute second and minute second respectively. The data must correspond to the given format.
hm
hms
ms
hour minute
hour minute second
minute second
Returns true if the passed in string matches the locale's idea of a string designating yes. Note that under POSIX rules unless the locale's word for yes starts with Y (U+0079) then a single 'y' will also be accepted as yes. The string will be matched case insensitive.
Y
Returns true if the passed in string matches the locale's idea of a string designating no. Note that under POSIX rules unless the locale's word for no starts with n (U+006E) then a single 'n' will also be accepted as no The string will be matched case insensitive.
n
This method requires Perl version 5.18 or above to use and for you to have installed the optional Bundle::CLDR::Transformations
Bundle::CLDR::Transformations
This method returns the transliterated string of text from script from to script to using variant variant. If from is not given then the current locale's script is used. If text is not given then it defaults to an empty string. The variant is optional.
text
from
to
Returns data as a string formatted by the locales idea of producing a list of elements. What is returned can be effected by the locale and the number of items in data. Note that data can contain 0 or more items.
data
This method takes a number and uses the locale's pluralisation rules to calculate the type of pluralisation required for units, currencies and other data that changes depending on the plural state of the number
This method returns the plural type for the range $start to $end $start and $end can either be numbers or one of the plural types zero one two few many other
zero one two few many other
This method returns a list containing all the valid language codes
This method returns a list containing all the valid script codes
This method returns a list containing all the valid region codes
This method returns a list containing all the valid variant codes
This method returns a hash that maps valid keys to their valid aliases
This method returns a hash that maps valid key aliases to their valid keys
This method returns a hash of valid keys and the valid type codes you can have with each key
This method returns a hash that maps valid language codes to their valid aliases
This method returns a hash that maps valid region codes to their valid aliases
This method returns a hash that maps valid variant codes to their valid aliases
There are no standard codes for the days of the weeks so CLDR uses the following three letter codes to represent unlocalised days
Sunday
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
This method takes an optional region id and returns a the minimum number of days a week must have to count as the starting week of the new year. It uses the current locale's region if no region id is passed in.
This method takes an optional region id and returns the three letter code of the first day of the week for that region. If no region id is passed in then it uses the current locale's region.
This method takes an optional region id and returns the three letter code of the first day of the weekend for that region. If no region id is passed in then it uses the current locale's region.
This method takes an optional region id and returns the three letter code of the last day of the weekend for that region. If no region id is passed in then it uses the current locale's region.
The Chinese lunar calendar can insert a leap month after nearly any month of its year; when this happens, the month takes the name of the preceding month plus a special marker. The Hindu lunar calendars can insert a leap month before any one or two months of the year; when this happens, not only does the leap month take the name of the following month plus a special marker, the following month also takes a special marker. Moreover, in the Hindu calendar sometimes a month is skipped, in which case the preceding month takes a special marker plus the names of both months. The monthPatterns() method returns an array ref of month names with the marker added.
This method returns an arrayref containing the cyclic names for the locale's default calendar using the given context, width and type.
Context can can currently only be c<format>
Width is one of abbreviated, narrow or wide
abbreviated
narrow
wide
Type is one of dayParts, days, months, solarTerms, years or zodiacs
dayParts
days
months
solarTerms
years
zodiacs
This method returns a hash ref keyed on region id. The value is an array ref. Each element of the array ref is a region id of a region immediately contained in the region used as the key
This method returns a hash ref keyed on region id. The value of the hash is the region id of the immediately containing region.
This method returns a hash ref keyed on numbering system id which, for a given locale, can be got by calling the default_numbering_system() method. The values of the hash are a two element hash ref the keys being type and data. If the type is numeric then the data is an array ref of characters. The position in the array matches the numeric value of the character. If the type is algorithmic then data is the name of the algorithm used to display numbers in that format.
type
numeric
algorithmic
This method formats the number $number using the format $format. If the format contains the currency symbol ¤ then the currency symbol for the currency code in $currency will be used. If $currency is undef() then the default currency code for the locale will be used.
¤
Note that currency codes are based on region so if you do not pass in a currency and your locale did not get passed a region in the constructor you are going to end up with the likely sub tag's idea of the currency. This functionality may be removed or at least changed to emit a warning in future releases.
$for_cash is only used during currency formatting. If true then cash rounding will be used otherwise financial rounding will be used.
This function also handles rule based number formatting. If $format is string equivalent to one of the current locale's public rule based number formats then $number will be formatted according to that rule.
This method formats the number $number using the default currency and currency format for the locale. If $for_cash is a true value then cash rounding will be used otherwise financial rounding will be used.
This method returns the format with the currency symbol $symbol correctly inserted into the format
This method parses a CLDR numeric format string into a hash ref containing data used to format a number. If a currency is being formatted then $currency contains the currency code, $currency_data is a hashref containing the currency rounding information and $for_cash is a flag to signal cash or financial rounding.
This should probably be a private function.
This method returns $number rounded to the nearest $increment with $decimal_digits digits after the decimal point
This method takes the $format produced by parse_number_format() and uses it to parse $number. It returns a string containing the parsed number. If a currency is being formatted then $currency_data is a hashref containing the currency rounding information and $for_cash is a flag to signal cash or financial rounding.
This method returns an array containing the digits used by the locale, The order of the array is the order of the digits. It the locale's numbering system is algorithmic it will return [0,1,2,3,4,5,6,7,8,9]
[0,1,2,3,4,5,6,7,8,9]
This method returns the numbering system id for the locale.
This method returns the locale's currenc format. This can be used by the number formatting code to correctly format the locale's currency
This method returns the format string for the currencies for the locale
There are two types of formatting standard and accounting you can pass standard or accounting as the paramater to the method to pick one of these ot it will use the locales default
accounting
This method returns a hash ref keyed on region, the value being the measurement system id for the region. If the region you are interested in is not listed use the region_contained_by() method until you find an entry.
This method returns a hash ref keyed on region, the value being the paper size used in that region. If the region you are interested in is not listed use the region_contained_by() method until you find an entry.
A full locale tag requires, as a minimum, a language, script and region code. However for some locales it is possible to infer the missing element if the other two are given, e.g. given en_GB you can infer the script will be latn. It is also possible to fill in the missing elements of a locale with sensible defaults given sufficient knowledge of the layout of the CLDR data and usage patterns of locales around the world.
This function returns a hash ref keyed on partial locale id's with the value being the locale id for the most likely language, script and region code for the key.
This method returns a Locale::CLDR object with any missing elements from the language, script or region, filled in with data from the likely_subtags hash
This method returns a hash ref keyed on currency id. The value is a hash ref containing four keys. The keys are
The number of decimal digits normally formatted.
The rounding increment, in units of 10^-digits.
The number of decimal digits to be used when formatting quantities used in cash transactions (as opposed to a quantity that would appear in a more formal setting, such as on a bank statement).
The cash rounding increment, in units of 10^-cashdigits.
This method returns the default currency id for the region id. If no region id is given then the current locale's is used
This method returns the currency symbol for the given currency id in the current locale. If no currency id is given it uses the locale's default currency
This method returns a hash ref keyed on region id. The values are array refs containing the preferred calendar id's in order of preference.
This method returns the default calendar id for the given region. If no region id given it used the region of the current locale.
Locale::CLDR has a Locle::Maketext alike system called LocaleText
The Lexicon stores the items that will be localized by the localetext method. You can manipulate it by the following methods
This method empties the lexicon
This method adds data to the locales lexicon.
$identifier is the string passed to localetext() to get the localised version of the text. Each identfier is unique
$localized_text is the value that is used to create the current locales version of the string. It uses Locale::Maketext bracket formatting syntax with some additional methods and some changes to how numerate() works. See below
Multiple entries can be added by one call to add_to_lexicon()
$identifier is the string passed to localetext() to get the localised version of the text. Each identfier is unique and must be different from the identifiers given to add_to_lexicon()
$pluralform is one of the CLDR's plural forms, these are zero, one, two, few, many and other
zero, one, two, few, many
other
The make text emulation uses the same bracket and escape mecanism as Locale::Maketext. ie ~ is used to turn a [ from a metta character into a normal one and you need to doubble up the ~ if you want it to appear in your output. This allows you to embed into you output constructs that will change depending on the locale.
Due to the way macro expantion works in localetext any element of the [ ... ] construct except the first may be substutied by a _1 marker
localetext() will replace [numf,_1] with the correctly formatted version of the number you passed in as the first paramater after the identifier.
[numf,_1]
This will substutite the correct plural form of the coins text into the string
This will substute the correctly gendered spellout rule for the number given in _1
This method looks up the identifier in the current locales lexicon and then formats the returned text as part in the current locale the identifier is the same as the identifier passed into the add_to_lexicon() metod. The parameters are the values required by the [ ... ] expantions in the localised text.
This method returns a Locale::CLDR::Collator object. This is still in development. Future releases will try and match the API from Unicode::Collate as much as possible and add tailoring for locales.
Other locales can be found on CPAN. You can install Language packs from the Locale::CLDR::Locales::* packages. You can install language packs for a given region by looking for a Bundle::Locale::CLDR::* package.
John Imrie, <JGNI at cpan dot org>
<JGNI at cpan dot org>
Please report any bugs or feature requests to bug-locale-cldr at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Locale-CLDR. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
bug-locale-cldr at rt.cpan.org
You can find documentation for this module with the perldoc command.
perldoc Locale::CLDR
You can also look for information at:
RT: CPAN's request tracker
http://rt.cpan.org/NoAuth/Bugs.html?Dist=Locale-CLDR
AnnoCPAN: Annotated CPAN documentation
http://annocpan.org/dist/Locale-CLDR
CPAN Ratings
http://cpanratings.perl.org/d/Locale-CLDR
Search CPAN
http://search.cpan.org/dist/Locale-CLDR/
Everyone at the Unicode Consortium for providing the data.
Karl Williams for his tireless work on Unicode in the Perl regex engine.
Copyright 2009-2024 John Imrie and others.
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See http://dev.perl.org/licenses/ for more information.
To install Locale::CLDR, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Locale::CLDR
CPAN shell
perl -MCPAN -e shell install Locale::CLDR
For more information on module installation, please visit the detailed CPAN module installation guide.