HTML::ToDocBook - Converts an XHTML file into DocBook.
This describes version 0.03 of HTML::ToDocBook.
use HTML::ToDocBook; my $obj = HTML::ToDocBook->new(%args); $obj->convert(infile=>$filename); # convert HTML file $obj->convert(infile=>$filename, html=>1);
This module converts an XHTML file into DocBook format using both heuristics and XSLT processing. By default, this expects the input file to be correct XHTML -- there are other programs such as html tidy (http://tidy.sourceforge.net/) which can correct files for you; this does not do that.
Note also this is very simple; it doesn't deal with things like <div> or <span> which it has no way of guessing the meaning of. (For some, however, if they have class names which match DocBook tags, they will be turned into those tags) This does not merge multiple XHTML files into a single document, so this converts each XHTML file into a <chapter>, with each header being a section (sect1 to sect5). The <title> tag is used for the chapter title.
There will likely to be validity errors, depending on how good the original HTML was. There may be broken links, <xref> elements that should be <link>s, and overuse of <emphasis> and <emphasis role="bold">.
my $conv = HTML::ToDocBook->new(); my $conv = HTML::ToDocBook->new(stylesheet=>$stylesheet);
Arguments:
A replacement XSLT stylesheet to use for conversions instead of the built-in one. This can either be a file name or a string containing the entire stylesheet.
$obj->convert(infile=>$filename, html=>1);
The name of the file to convert.
Parse the input as HTML rather than XML.
These are not guaranteed to be stable.
$my str = $obj->insert_sections($string);
This inserts <div class="sectN"> tags to enclose all levels of header. These will then be picked up by the XSLT stylesheet and converted into section tags.
Cwd File::Basename File::Spec XML::LibXML XML::LibXSLT HTML::SimpleParse Test::More
To install this module, run the following commands:
perl Build.PL ./Build ./Build test ./Build install
Or, if you're on a platform (like DOS or Windows) that doesn't like the "./" notation, you can do this:
perl Build.PL perl Build perl Build test perl Build install
In order to install somewhere other than the default, such as in a directory under your home directory, like "/home/fred/perl" go
perl Build.PL --install_base /home/fred/perl
as the first step instead.
This will install the files underneath /home/fred/perl.
You will then need to make sure that you alter the PERL5LIB variable to find the modules, and the PATH variable to find the script.
Therefore you will need to change: your path, to include /home/fred/perl/script (where the script will be)
PATH=/home/fred/perl/script:${PATH}
the PERL5LIB variable to add /home/fred/perl/lib
PERL5LIB=/home/fred/perl/lib:${PERL5LIB}
perl(1).
Please report any bugs or feature requests to the author.
Kathryn Andersen (RUBYKAT) perlkat AT katspace dot com http://www.katspace.org/tools
XSLT stylesheet based on the one at http://wiki.docbook.org/topic/Html2DocBook by Jeff Beal
Copyright (c) 2006 by Kathryn Andersen
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
To install HTML::ToDocBook, copy and paste the appropriate command in to your terminal.
cpanm
cpanm HTML::ToDocBook
CPAN shell
perl -MCPAN -e shell install HTML::ToDocBook
For more information on module installation, please visit the detailed CPAN module installation guide.