The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Locale::Maketext::Gettext - bring Maketext and gettext together

SYNOPSIS

In your localization class:

  package MyPackage::L10N;
  use base qw(Locale::Maketext::Gettext);
  return 1;

In your application:

  use MyPackage::L10N;
  $LH = MyPackage::L10N->get_handle or die "What language?";
  $LH->bindtextdomain("mypackage", "/home/user/locale");
  $LH->textdomain("mypackage");
  $LH->maketext("Hello, world!!");

DESCRIPTION

Locale::Maketext::Gettext brings GNU gettext and Maketext together. It is a subclass of Locale::Maketext(3) that follows the way GNU gettext works. It works seamlessly, both in the sense of GNU gettext and Maketext.

You start as an usual GNU gettext localization project: Work on PO files with the help of translators, reviewers and Emacs. Turn them into MO files with msgfmt. Copy them into the appropriate locale directory, such as /usr/share/locale/de/LC_MESSAGES/myapp.mo.

Then, build your Maketext localization class, with your base class changed from Locale::Maketext(3) to Locale::Maketext::Gettext. That's all. ^_*'

METHODS

$LH->bindtextdomain(DOMAIN, LOCALEDIR)

Register a text domain with a locale directory. It is only a registration. Nothing really happens here. No check is ever made whether this LOCALEDIR exists, nor if DOMAIN really sit in this LOCALEDIR. Returns LOCALEDIR itself. If LOCALEDIR is omitted, the registered locale directory of DOMAIN is returned. If DOMAIN is not even registered yet, return undef. This method always success.

$LH->textdomain(DOMAIN)

Set the current text domain. It reads the corresponding MO file and replaces the %Lexicon with this new lexicon. If anything went wrong, for example, MO file not found, unreadable, NFS disconnection, etc., it returns immediatly and the your lexicon becomes empty. Returns the DOMAIN itself. If DOMAIN is omitted, the current text domain is returned. If the current text domain is not even set yet, return undef. This method always success.

$LH->language_tag

Retrieve the output encoding. This is the same method in Locale::Maketext(3). It is readonly.

$LH->encoding(ENCODING)

Set or retrieve the output encoding. The default is UTF-8 for the whole bunch of languages I do not know. :p You should check this ans set it according to the current language tag and your requirement. You can access the current language tag by the $LH->language_tag method above.

WARNING: If you set this to an incorrect encoding, maketext may die for illegal characters in that encoding. For example, try to encode Chinese text into US-ASCII. You can trap this failure in an eval {}, or alternatively you can set the encoding to UTF-8 and post-process the returned UTF-8 text by yourself.

$text = $LH->maketext($key, @param...)

The same method in Locale::Maketext(3), with a wrapper that return the text string encoded according to the current encoding.

FUNCTIONS

%Lexicon = Locale::Maketext::Gettext::readmo($MOfile);

Read and parse the MO file and return the %Lexicon. This subroutine is called by the textdomain method to retrieve the current %Lexicon. The result is cached, to reduce the overhead of file reading and parsing again and again, especially in mod_perl where textdomain may ask for %Lexicon for every connection. This is exactly the same way GNU gettext works. I'm not planning to change it. If you DO need to re-read the modified MO file, clear the hash %Locale::Maketext::Gettext::Lexicons.

readmo() recognizes the MO file format revision number, and refuses to parse unrecognized MO file formats. Currently there is only one MO file format: revision 0.

readmo() is not automatically exported.

NOTES

WARNING: Don't try to put any lexicon in your language subclass. When the textdomain method is called, the current lexicon will be replaced, but not appended. This is to accommodate the way textdomain works. Messages from the previous text domain should not stay in the current text domain.

The idea of Locale::Maketext::Getttext came from Locale::Maketext::Lexicon(3), a great work by autrijus. But it is simply not finished yet and not practically usable. So I decide to write a replacement.

The part of calling msgunfmt is removed. The gettext MO file format is officially documented, so I decided to parse it by myself. It is not hard. It reduces the overhead to raising a subshell. It benefits from the fact that reading and parsing MO binary files is much faster then PO text files, since regular expression is not involved. Also, after all, msgunfmt is not portable on non-GNU systems.

Locale::Maketext::Gettext also solved the problem of lack of the ability to handle the encoding in Locale::Maketext(3). When %Lexicon is read from MO files by readmo(), the encoding tagged in gettext MO files is used to decode the text into perl's internal encoding. Then, when extracted by maketext, it is encoded by the current encoding value. The encoding can be changed at run time, so that you can run a daemon and output to different encoding according to the language settings of individual users, without having to restart the application. This is an improvement to the Locale::Maketext(3), and is essential to daemons and mod_perl applications.

Another benefit of this encode/decode is described below: In some multi-byte encodings, like Big5, the Maketext magic characters [ and ] are part of some multibyte characters. It will raise the error of "Unterminated bracket group" to Locale::Maketext, even though it is most natural to the native language speakers. It isn't right to insert an escape character before that magic character, since this breaks the whole multibyte character into halves, and the text will become unreadable to the translators and reviewers. A decode wrapper solves this problem. The internal encoding of perl is, utill now, Maketext-safe.

BUGS

The default encodings of all the languages

The default encoding for all languages should not be UTF-8. It's the last thing to do. I tries to tell the default encoding of all possible language I know, including zh-tw, zh-cn, zh-hk, zh-sg, zh, ja, ko, en-us, en. Please tell me the proper default encoding of your language so that I can add it into this list. Thank you. ^_*'

And I'll be mostly appreciated if someone can solve this in another module, like in I18N::LangTags (or if someone can show me a table so that I can create something like I18N::LangTags::Encodings). Deciding the default encoding should not be the job of Locale::Maketext::Gettext.

The reason for a "default encoding" is clear: GNU gettext never fails. It works when if you set your locale to zh_TW, but not zh_TW.Big5. That's the right thing to do.

Error handler when encode failed

There should be an option to decide what to do when encode failed. The only 2 reasonble choices for me are FB_CROAK and FB_HTMLCREF. I choose FB_CROAK, since FB_HTMLCREF is certainly not a good choice as a default behavior. But FB_CROAK is not a good choice, too. I should implement some way to set this at run time. It's not hard. It's just that I didn't do it yet.

I think the other thing may be necessary, too: A method to check if this current encoding works. That is, encode/decode every text strings using the current encoding and see if everyone smiles. This should only be used in the development stage, for example, to check if the returning text from the translators/reviewers contains any illegal characters in their PO/MO files.

A method to clear the current MO text cache

Is this necessary? I don't know. It is not hard, after all. It might be necessary for a mod_perl application, to update the text translation without having to restart Apache.

SEE ALSO

Locale::Maketext(3), Locale::Maketext::TPJ13(3), Locale::Maketext::Lexicon(3), Encode(3), bindtextdomain(3), textdomain(3). Also, please refer to the official GNU gettext manual at http://www.gnu.org/manual/gettext/.

AUTHOR

imacat <imacat@mail.imacat.idv.tw>

COPYRIGHT

Copyright (c) 2003 imacat. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.