bernard - alphabet remix
Thomas Thurman <thomas@thurman.org.uk>
bernard <source> -o <target>
bernard takes files written in the conventional alphabet and returns them written in some other alphabet.
At present, only the Shavian alphabet is supported.
Select output file. If this is not specified, the output is written to the standard output.
Select alphabet. Use the ISO 15924 code. This is not case-sensitive. The only arguments currently accepted are "Shaw", which represents the Shavian alphabet, and "Latn", which causes no transformation to the input text.
Specifies the alphabet of the source document. The default is "Latn". This is not automatically detected, because the use-cases are so different. This is not case-sensitive. The only two values allowed are "Latn" and "Shaw". Selecting "Shaw" will allow you to transliterate a document in Shavian into, for example, Deseret.
"Latn"
"Shaw"
If "Shaw" is selected, this has the additional effect of causing every stanza in a .po file to be transliterated, not only the fuzzy and empty ones. It also disables the --in-place switch.
--in-place
Selecting the same source and target alphabet is a valid choice, but means that there will be no change between input and output.
It is currently an error to select "Shaw" as the source alphabet and "Latn" as the target alphabet. In other words, you can't yet undo a transliteration into Shavian. This may be added one day.
This entire option is not yet implemented.
This switch only makes sense with gettext .po files. It means that the msgids in the file are not English strings, but identifiers, and that the English strings are in the .po file whose name is supplied. This is often found in Nokia catalogues.
This is not yet implemented.
Runs the resulting file through "msgfmt -c" to check its validity.
"msgfmt -c"
This writes the output file over the top of the input file.
This switch is only useful with gettext .po files. It is disabled for other filetypes because it would be dangerous: you would lose the original text.
This replaces Shavian letters with their traditional ASCII equivalents. It is disabled for other alphabets. This will cause obvious difficulties if the output would ordinarily contain Latin-alphabet letters. Latin-alphabet letters discovered in the text will be retained.
This is not currently implemented.
The inverse operation is obtained by using -m unarmour.
-m unarmour
This is a nasty hack. It shifts the letters of the output alphabet down so that they begin at codepoint 128. This is needed because of shortcomings in the UTF-8 decoding of some programs, and when you may be unable to use -a because you need to include characters from both alphabets. You will, of course, need a special font with the relevant glyphs at these non-standard positions.
-a
Transliterates the given expression. This is output before any other file.
Checks to see whether there's an updated version of the Shavian set used for transliteration, and downloads it if there is.
Selects an alternative mode of operation. The defalt is single, which behaves as described above. Other values have other effects, described in "Magic modes", below.
single
George Bernard Shaw believed that apostrophes, which he called "uncouth bacilli", were redundant. In honour of this opinion, the -p option strips apostrophes from the transliterated output where they occur within words. The rare apostrophes at the beginnings or endings of words (as in 'tis) will not be stripped, in case you use them for quotation marks.
-p
'tis
This allows you to define the Shavian spelling of a word temporarily. Its argument is the Latin-alphabet spelling, followed by an equals sign, followed by the Shavian spelling. In case you cannot type Shavian letters, you may use the standard ASCII-armouring. For example, to cause the word "of" to be written out in full, rather than as a single-letter abbreviation, use -Dof=ov.
-Dof=ov
These are selected using the -m or --magic switch.
-m
--magic
This is the default, and behaves as described above.
In this mode, the sole non-option argument should be the name of a Shavian .po file. The master template for that package will be downloaded and merged with the .po file, the transliterations will be updated, and then run through msgfmt -c to check them.
msgfmt -c
Alternatively, the non-option argument may be the name of a directory. Each subdirectory of this directory should contain a GNOME package, which contains a file po/en@shaw.po. Each of these files will be acted on as described in the previous paragraph.
po/en@shaw.po
This undoes the effect of the -a or --armour switch. The single non-option argument is a file, which is output verbatim except that characters from the Latin alphabet will be replaced with their corresponding values in the old Shavian-to-Latin mapping.
--armour
Probably many.
Code to update the Shavian transliteration of Firefox exists, but has not yet been merged into bernard. It will be merged at some point.
bernard
It will also be possible later to translate Qt's .ts files.
.ts
Code to handle .srt subtitle files exists, but has not yet been merged.
.srt
It doesn't handle any other alphabets than Shavian and the conventional alphabet. At least Deseret will be added.
There are several other planned features which are as yet unimplemented.
This Perl module is copyright (C) Thomas Thurman, 2010. This is free software, and can be used/modified under the same terms as Perl itself.
To install App::Bernard, copy and paste the appropriate command in to your terminal.
cpanm
cpanm App::Bernard
CPAN shell
perl -MCPAN -e shell install App::Bernard
For more information on module installation, please visit the detailed CPAN module installation guide.