Alvis::Canonical - Perl extension for converting documents in various formats into the Alvis canonical format for documents
use Alvis::Canonical; # Create a new instance, specify the conversion of both numeric and # symbolic character entities to Unicode characters my $C=Alvis::Canonical->new(convertCharEnts=>1, convertNumEnts=>1); if (!defined($C)) { die("Unable to instantiate Alvis::Canonical."); } # Convert an HTML document text in UTF-8 to the canonical format. # Specify that you want the title and baseURL as well, if any can be # determined. my ($txt,$header)=$C->HTML($html, {title=>1, baseURL=>1}); if (!defined($txt)) { die $C->errmsg(); }
Assumes the input is in UTF-8 and does NOT contain '\0's (or rather that they carry no meaning and are removable).
Available options:
warnings Issue warnings about badly faulty original HTML where we have to resort to an heuristic solution. Puts a warning to STDERR documenting the error and the solution. Default: no. convertCharEnts Convert HTML symbolic character entities to UTF-8 characters? Default: yes. convertNumEnts Convert HTML numerical character entities to UTF-8 characters? Default: yes. sourceEncoding the encoding of the source documents. Default: undef, which means it is guessed. my $C=Alvis::Canonical->new(convertCharEnts=>1, convertNumEnts=>1); if (!defined($C)) { die die("Unable to instantiate Alvis::Canonical."); }
Converts dirty HTML to a valid Alvis canonicalDocument. $options is a mechanism for returning the title and base URL of the document. If their extraction is desired, set fields 'title' and 'baseURL' to a defined value. If you know the encoding of the source document, set option 'sourceEncoding', e.g.
my ($txt,$header)=$C->HTML($html, {title=>1, baseURL=>1, sourceEncoding=>'iso-8859-2'});
Returns a stack of error messages, if any. Empty string otherwise.
Alvis::Convert
Kimmo Valtonen, <kimmo.valtonen@hiit.fi>
Copyright (C) 2006 by Kimmo Valtonen
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available.
To install Alvis::Convert, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Alvis::Convert
CPAN shell
perl -MCPAN -e shell install Alvis::Convert
For more information on module installation, please visit the detailed CPAN module installation guide.