parse_html - Parse HTML text
parse_htmlfile - Parse HTML text from file
use HTML::Parse; $h = parse_htmlfile("test.html"); print $h->dump; $h = parse_html("<p>Some more <i>italic</i> text", $h); $h->delete; print parse_htmlfile("index.html")->as_HTML; # tidy up markup in a file
This module provides functions to parse HTML documents. The result of the parsing is a HTML syntax tree with HTML::Element objects as nodes. Check out HTML::Element for details of methods available to access the syntax tree.
The parser currently understands HTML 2.0 markup + tables + some Netscape extentions.
Entites in all text content and attribute values will be expanded by the parser.
The parser is able to parse HTML text incrementally. The document can be given to parse_html() in arbitrary pieces. The result should be the same.
The following variables control how parsing takes place:
Setting this variable to true will instruct the parser to try to deduce implicit elements and implicit end tags. If this variable is false you get a parse tree that just reflects the text as it stands. Might be useful for quick & dirty parsing. Default is true.
Implicit elements have the implicit() attribute set.
This variable contols whether unknow tags should be represented as elements in the parse tree. Default is true.
Do not represent the text content of elements. This saves space if all you want is to examine the structure of the document. Default is false.
Call warn() with an apropriate message for syntax errors. Default is false.
Does not parse tag attributes with the ">" character in the value correctly:
<img src="..." alt="4.4 > V">
If you want to free the memory assosiated with the HTML parse tree, then you will have to delete it explicitly. The reason for this is that perl currently has no proper garbage collector, but depends on reference counts in the objects. This scheme fails because the parse tree contains circular references (parents have references to their children and children have a reference to their parent).
HTML::Element, HTML::Entities
Copyright 1995,1996 Gisle Aas. All rights reserved.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
Gisle Aas <aas@sn.no>
To install LWP, copy and paste the appropriate command in to your terminal.
cpanm
cpanm LWP
CPAN shell
perl -MCPAN -e shell install LWP
For more information on module installation, please visit the detailed CPAN module installation guide.