Lingua::RU::Charset - Perl extension for detecting and converting various russian character sets: KOI8-r, Windows-1251, CP866, ISO-8859-5, X-Mac-Cyrillic, russian text in english letters, russian part of Unicode and UTF-8. This module can be especially useful for computers with broken cyrillic locales (like foreign web hosts).


  use Lingua::RU::Charset qw (:CHARSET);
  use Lingua::RU::Charset qw (:CONVERT);
  use Lingua::RU::Charset qw (:CONVERT :CHARCASE);
  use Lingua::RU::Charset qw (any2koi koi2lc koi2uc);


More documentation and examples coming soon...


Unfortunately I don't have time to implement the Unicode and UTF-8 subroutines. But I am sure that such functions would be useful for interesting Perl scripts exchanging russian data with Java servlets. So you are welcome to submit some code!


Alex Farber, <>


"The Cyrillic Charset Soup" article by Roman Czyborra located at lists various cyrillic charsets. The russian texts for counting frequencies of letter pairs have been taken from "The Eugene Peskin's Electronic Library" located at Please consider also visiting my home page at where I collect links to articles and news about Perl, Python, JavaScript, databases etc.