HTML::ListScraper::Interactive - formatting data from HTML::ListScraper
Formats a tag sequence to emphasize its tree-like structure. Takes 2 or 3 parameters: a HTML::ListScraper object, array reference containing HTML::ListScraper::Tag objects and an optional hash with formatting options. format_tags returns an array (array reference if called in a scalar context) with formatted tag names and text.
format_tags
The formatting options are
Include the href attribute in the output.
href
Include the plain text in the output.
Include tag positions in the output.
The returned values are basically XHTML lines: opening tags, text with quoted entities and closing tags. Tags are enclosed in angle brackets. The returned values don't necessarily form a valid XML fragment, though, i.e. because the input tags need not form a tree.
When index is set, tag values start with the tag's index, followed by a tab. Next, spaces show indentation. An opening tag not identified as missing a closing tag increases indentation by 2 spaces, a closing tag decreases it back. An opening tag with missing closing tag is output with '/' appended to its name. For the rules of associating opening and closing tags, see HTML::ListScraper::shapeless.
index
HTML::ListScraper::shapeless
When attr is set, links are formatted without whitespace and enclosed in double quotes. Double quotes in links are escaped, but no other characters are (which can also make the result invalid HTML). When text is set, the output text has normalized whitespace; nodes containing only whitespace are dropped. Gaps between adjacent tag positions are displayed as an empty line. All values end with a newline.
attr
text
Undoes the formatting done by format_tags. Takes a list of lines such as those output by format_tags when called without any formatting options and converts them to a list of tag names. Note that canonicalize_tags doesn't handle attributes, text lines nor index numbers.
canonicalize_tags
To install HTML::ListScraper, copy and paste the appropriate command in to your terminal.
cpanm
cpanm HTML::ListScraper
CPAN shell
perl -MCPAN -e shell install HTML::ListScraper
For more information on module installation, please visit the detailed CPAN module installation guide.