HTML-Tree - overview of HTML::TreeBuilder et al
use HTML::TreeBuilder; my $tree = HTML::TreeBuilder->new(); $tree->parse_file($filename); # # Then do something with the tree, using HTML::Element # methods -- for example $tree->dump # # Then: $tree->delete;
HTML-Tree is a suite of Perl modules for making parse trees out of HTML source. It consists of mainly two modules, whose documentation you should refer to: HTML::TreeBuilder and HTML::Element.
HTML::TreeBuilder is the module builds the parse trees. (It uses HTML::Parser to do the work of breaking the HTML up into tokens.)
The tree that TreeBuilder builds for you is made up of objects of the class HTML::Element.
If you find that you do not properly understand the documentation for HTML::TreeBuilder and HTML::Element, it may be because you are unfamiliar with tree-shaped data structures, or with object-oriented modules in general. I have written some articles for The Perl Journal (www.tpj.com) that seek to provide that background: my article "Scanning HTML" in TPJ19; my article "Trees" in TPJ18, and my article "A User's View of Object-Oriented Modules" in TPJ17. The full text of those articles will likely appear in a later version of this HTML-Tree module distribution.
www.tpj.com
HTML::TreeBuilder, HTML::Element, HTML::Tagset, HTML::Parser
HTML::DOMbo
Copyright 1995-1998 Gisle Aas; copyright 1999-2001 Sean M. Burke.
The whole HTML-Tree distribution, of which this file is a part, is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
Original HTML-Tree author Gisle Aas <gisle@aas.no>; current maintainer Sean M. Burke, <sburke@cpan.org>
To install HTML::Tree, copy and paste the appropriate command in to your terminal.
cpanm
cpanm HTML::Tree
CPAN shell
perl -MCPAN -e shell install HTML::Tree
For more information on module installation, please visit the detailed CPAN module installation guide.