html2xml.pl - script for generating formatted XML from HTML
html2xml.pl <filename> cat <filename> | html2xml.pl
This script was made to clean HTML documents in order to put data included in a XML native database. Generated XML elements are : <div> <table> <p> <row> <cell> <url> <list> <item> <br/>
<div> can be the BODY element or a DIV element As everything, it's not a perfect script , so i will be pleased if you mail me bug you find.
Ce script est fait pour extraire les données "utiles" d'un document HTML, et les sauvegardes dans un document XML dont les éléments sont : <div> <table> <p> <row> <cell> <url> <list> <item> <br/>
<div> comporte l'élément BODY ou les DIV du document HTML
Ce script n'est pas parfait et il est donc fort possible que vous en repériez un disfonctionnement. Je serai ravi que vous m'en faisiez part dans un courriel.
HTML::TreeBuilder Encode (included in Perl 5.8) =head1 OSNAMES
any
Francois Colombier <francois.colombier@free.fr>
This script is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
Web
1 POD Error
The following errors were encountered while parsing the POD:
Non-ASCII character seen before =encoding in 'données'. Assuming CP1252
To install html2xml.pl, copy and paste the appropriate command in to your terminal.
cpanm
cpanm html2xml.pl
CPAN shell
perl -MCPAN -e shell install html2xml.pl
For more information on module installation, please visit the detailed CPAN module installation guide.