Alvis::Encoding - Perl extension for guessing and checking the encoding of documents.
use Alvis::Encoding; # Create a new instance my $e=Alvis::Encoding->new(); if (!defined($e)) { die "Instantiating Alvis::Encoding failed."; } # Check that a (decimal) character code is legal UTF-8 my $code=55; if (!$e->code_is_utf8($code)) { # The message will contain the position and the offending character's code die $e->errmsg(); } # Check that a text is legal UTF-8 my $text; if (!$e->is_utf8($text)) { # The message will contain the position and the offending character's code die $e->errmsg(); } # If you need to obtain the position (1..) and the offending character, # pass a placeholder in a hash ref argument: my %err=(); if (!$e->is_utf8($text,\%err)) { my $position=$err{pos}; my $code=$err{code}; . . . } # # Guess the encoding of a document given a guess for its type # my $type_guesser=Alvis::Document::Type->new(); my ($doc_type,$doc_sub_type)=$type_guesser->guess($text); my $doc_encoding=$e->guess($text,$doc_type,$doc_sub_type); if (!defined($doc_encoding)) { die('Cannot guess. ' . $e->errmsg()); } # # Try converting a document to UTF-8 with only its type known # my $type_guesser=Alvis::Document::Type->new(); my ($doc_type,$doc_sub_type)=$type_guesser->guess($text); my $doc_in_utf8=$e->try_to_convert_to_utf8($text,$doc_type,$doc_sub_type); if (!defined($doc_in_utf8)) { die('Cannot guess. ' . $e->errmsg()); } # Try to guess what was meant my @possibilities=$e->guess_typo_fixes('uft-8');
A collection of methods for guessing, confirming and fixing the encoding of a document.
Options:
defaultDocType default type for a document. Default: text. defaultDocSubType default sub type for a document. Default: html. defaultEncoding default encoding for a document. Default: iso-8859-1.
Returns 1 if the (decimal) character code is legal UTF-8.
Returns 1 if all of the characters of $text are legal UTF-8 Else, returns 0 and sets an error message specifying the location (1..) of the first illegal character code If you wish to obtain the position and offending code, pass a hash ref ($err_hash_ref). The info is in $err_hash_ref->{pos} and $err_hash_ref->{code}.
Guess the encoding of a document given a guess for its type (and subtype).
Tries to first guess the encoding of the document given a guess at its type and subtype, and then tries to convert it to $target_encoding.
Tries to convert $text from $source_encoding to $target_encoding.
Returns a set of guesses for the meant encoding in a case of an encoding name containing typos.
Returns a stack of error messages, if any. Empty string otherwise.
Alvis::Document::Type
Kimmo Valtonen, <kimmo.valtonen@hiit.fi>
Copyright (C) 2006 by Kimmo Valtonen
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available.
To install Alvis::Convert, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Alvis::Convert
CPAN shell
perl -MCPAN -e shell install Alvis::Convert
For more information on module installation, please visit the detailed CPAN module installation guide.