NAME

CGI::Lingua - Create a multilingual web page

VERSION

Version 0.80

SYNOPSIS

CGI::Lingua is a powerful module for multilingual web applications offering extensive language/country detection strategies.

No longer does your website need to be in English only. CGI::Lingua provides a simple basis to determine which language to display a website. The website tells CGI::Lingua which languages it supports. Based on that list CGI::Lingua tells the application which language the user would like to use.

    use CGI::Lingua;
    # ...
    my $l = CGI::Lingua->new(['en', 'fr', 'en-gb', 'en-us']);
    my $language = $l->language();
    if ($language eq 'English') {
	print '<P>Hello</P>';
    } elsif($language eq 'French') {
	print '<P>Bonjour</P>';
    } else {	# $language eq 'Unknown'
	my $rl = $l->requested_language();
	print "<P>Sorry for now this page is not available in $rl.</P>";
    }
    my $c = $l->country();
    if ($c eq 'us') {
      # print contact details in the US
    } elsif ($c eq 'ca') {
      # print contact details in Canada
    } else {
      # print worldwide contact details
    }

    # ...

    use CHI;
    use CGI::Lingua;
    # ...
    my $cache = CHI->new(driver => 'File', root_dir => '/tmp/cache', namespace => 'CGI::Lingua-countries');
    $l = CGI::Lingua->new({ supported => ['en', 'fr'], cache => $cache });

SUBROUTINES/METHODS

new

Creates a CGI::Lingua object.

API SPECIFICATION

Input:
  supported  => ArrayRef[Str] | Str   # required; RFC-1766 language codes
  cache      => Object                # optional; CHI-compatible (get/set)
  config_file => Str                  # optional; YAML/XML/INI config path
  logger     => Object                # optional; must implement warn/info/error
  info       => Object                # optional; CGI::Info-compatible
  data       => Any                   # optional; forwarded to I18N::AcceptLanguage
  dont_use_ip => Bool                 # optional; disable IP-based fallback
  syslog     => Bool | HashRef        # optional; Sys::Syslog integration
  debug      => Bool                  # optional; enable debug logging

Returns: CGI::Lingua blessed hashref, or a clone when called on an object.

MESSAGES

"You must give a list of supported languages"  - no 'supported' key provided
"List of supported languages must be an array ref" - supported is wrong ref type
"Supported languages must be the short code"  - string too short or too long
"Logger must be a blessed object with warn/info/error methods" - bad logger arg

PSEUDOCODE

1. Normalise args via Params::Get and Object::Configure
2. Validate logger (must be blessed with warn/info/error) if provided
3. Validate supported (required, string or arrayref)
4. If cache and REMOTE_ADDR set, attempt to thaw a previously stored state
5. Bless and return fresh object with sentinel flags set to GEO_UNKNOWN

language

Tells the CGI application in what language to display its messages. The language is the natural name e.g. 'English' or 'Japanese'.

Sublanguages are handled sensibly, so that if a client requests U.S. English on a site that only serves British English, language() will return 'English'.

If none of the requested languages is included within the supported lists, language() returns 'Unknown'.

API SPECIFICATION

Input:  none beyond $self
Returns: Str - human-readable language name, or 'Unknown'

preferred_language

Same as language().

name

Synonym for language, for compatibility with Locale::Object::Language.

sublanguage

Tells the CGI what variant to use e.g. 'United Kingdom', or undef if it can't be determined.

API SPECIFICATION

Input:  none beyond $self
Returns: Str | undef

language_code_alpha2

Gives the two-character representation of the supported language, e.g. 'en' when you've asked for en-gb.

If none of the requested languages is included within the supported lists, language_code_alpha2() returns undef.

API SPECIFICATION

Input:  none beyond $self
Returns: Str (2 chars) | undef

code_alpha2

Synonym for language_code_alpha2, kept for historical reasons.

sublanguage_code_alpha2

Gives the two-character representation of the supported language, e.g. 'gb' when you've asked for en-gb, or undef.

API SPECIFICATION

Input:  none beyond $self
Returns: Str (2 chars) | undef

requested_language

Gives a human-readable rendition of what language the user asked for whether or not it is supported.

Returns the sublanguage (if appropriate) in parentheses, e.g. "English (United Kingdom)"

API SPECIFICATION

Input:  none beyond $self
Returns: Str - e.g. "English (United Kingdom)" or "Unknown"

country

Returns the two-character country code of the remote end in lowercase.

If IP::Country, Geo::IPfree or Geo::IP is installed, CGI::Lingua will make use of that, otherwise, it will do a Whois lookup. If you do not have any of those installed I recommend you use the caching capability of CGI::Lingua.

API SPECIFICATION

Input:  none beyond $self
Returns: Str (2 lowercase chars) | undef
  'Unknown' is only returned in the Baidu-EU special case via _handle_eu_country.

MESSAGES

"GEOIP_COUNTRY_CODE contains an invalid country code; ignoring"
"HTTP_CF_IPCOUNTRY contains an invalid country code; ignoring"
"X.X.X.X isn't a valid IP address"
"Can't determine country from LAN connection X"
"Can't determine country from loopback connection X"
"cache contains a numeric country: N"
"IP matches to a numeric country"

locale

HTTP doesn't have a way of transmitting a browser's localisation information which would be useful for default currency, date formatting, etc.

This method attempts to detect the information, but it is a best guess and is not 100% reliable. But it's better than nothing ;-)

Returns a Locale::Object::Country object.

API SPECIFICATION

Input:  none beyond $self
Returns: Locale::Object::Country | undef

time_zone

Returns the timezone of the web client.

If Geo::IP is installed, CGI::Lingua will make use of that, otherwise it will use ip-api.com

API SPECIFICATION

Input:  none beyond $self
Returns: Str (IANA timezone name) | undef

MESSAGES

"Couldn't determine the timezone"
"LWP::Simple::WithCache and LWP::Simple are both absent; cannot contact ip-api.com"
  Returns undef rather than croaking; install either LWP variant to enable ip-api lookups.

LIMITATIONS

  • Accept-Language left-to-right scan ignores q-values

    The second and third pass in _accept_language_match() scan the header left-to-right and ignore quality (q=0.x) values. A header such as de;q=0.9,en;q=0.1 on a site that only supports en would currently fail to fall back to English. Use I18N::AcceptLanguage passes only when possible.

  • Logger must be a blessed object

    The logger parameter is documented as accepting a code ref, array ref, or filename, but the current implementation calls $logger->$level() and will die on non-blessed values. Wrap alternative logger types in a Log::Abstraction instance before passing them to new().

  • es-419 sublanguage returns undef

    Three-part regional codes such as es-419 (Latin American Spanish) do not resolve to a sublanguage() value because ISO 3166-1 does not define '419'. This is a known limitation of the Locale::Object layer.

  • Whois lookups are slow and unreliable

    Without IP::Country, Geo::IP, or Geo::IPfree installed, country() falls back to Whois queries against live RIPE/ARIN/IANA servers. These can time out under load. Install at least one local geo-database module and enable the CHI cache to avoid this.

  • Sub::Private not yet enforced

    The _* private methods are currently accessible from outside the package. Sub::Private should be added to enforce encapsulation once white-box tests are updated to call only the public API.

  • IPv4-mapped IPv6 addresses are normalised to IPv4

    REMOTE_ADDR values in the form ::ffff:a.b.c.d (RFC 4291 section 2.5.5) are silently rewritten to the embedded a.b.c.d IPv4 address before any geo-lookup. This is correct for country detection purposes but means the raw address string is not preserved in cache keys or log messages.

  • EU country code is irresolvable (with one exception)

    IP addresses that Whois reports as country EU are mapped to 'Unknown' unless they fall within Baidu's known subnet (RT-86809). There is no ISO 3166-1 country code for the European Union.

AUTHOR

Nigel Horne, <njh at nigelhorne.com>

BUGS

Please report any bugs or feature requests to the author.

If HTTP_ACCEPT_LANGUAGE contains a sub-tag with a 3-digit UN M.49 region code (e.g. es-419 for Latin American Spanish), sublanguage() returns undef because ISO 3166-1 does not define numeric codes.

Please report any bugs or feature requests to bug-cgi-lingua at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=CGI-Lingua. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

Uses I18N::AcceptLanguage to find the highest priority accepted language. This means that if you support languages at a lower priority, it may be missed.

SEE ALSO

SUPPORT

This module is provided as-is without any warranty.

You can find documentation for this module with the perldoc command.

perldoc CGI::Lingua

You can also look for information at:

FORMAL SPECIFICATION

new

new : Class × Params → CGI::Lingua
∀ p : Params • p.supported ≠ ∅ ⟹ result.language ∈ (p.supported ∪ {'Unknown'})

language

language : CGI::Lingua → Str
result ∈ {name(l) | l ∈ supported} ∪ {'Unknown'}

ACKNOWLEDGEMENTS

LICENSE AND COPYRIGHT

Copyright 2010-2026 Nigel Horne.

This program is released under the following licence: GPL2