NAME

untemplate - analyze several HTML documents based on the same template

VERSION

version 0.005

SYNOPSIS

    untemplate [options] HTML1 HTML2 [HTML3] [...]

DESCRIPTION

Takes multiple HTML documents generated using the same template and attempts to extract only the data inserted into original template.

Accepts URL if AnyEvent::Net::Curl::Queued is present.

OPTIONS

--help

This.

--[no]color

Enable syntax highlight for XPath. By default, enabled automatically on interactive terminals.

--[no]strict

Strict mode disables grouping by id, class or name attributes. The grouping is enabled by default.

--unmangle=regex

Specify regex(es) to unmangle id/class attributes. Some CMS (WordPress) insert unique identifiers into HTML elements, like:

    <body class="post-id-12345">

This tend to break HTML tree analysis. To fix the above case, use --unmangle 'post-id-\d+'. Multiple unmanglers are accepted (--unmangle a --unmangle b).

EXAMPLES

    untemplate --color http://bash.org/?1839 http://bash.org/?2486 | less -R

CAVEATS

Trying to untemplate HTML documents not based on the same template, the results will be empty.

Unfortunately, employing any kind of document identifier as part of element class/id (common practice in WordPress themes) is enough to constitute "not same template".

AUTHOR

Stanislaw Pusep <stas@sysd.org>

COPYRIGHT AND LICENSE

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

To install HTML::Linear, copy and paste the appropriate command in to your terminal.

cpanm

cpanm HTML::Linear

CPAN shell

perl -MCPAN -e shell
install HTML::Linear

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)