WWW::Mechanize::Plugin::DOM - HTML Document Object Model plugin for Mech
0.005 (alpha)
use WWW::Mechanize; my $m = new WWW::Mechanize; $m->use_plugin('DOM', script_handlers => { default => \&script_handler, qr/(?:^|\/)(?:x-)?javascript/ => \&script_handler, }, event_attr_handlers => { default => \&event_attr_handler, qr/(?:^|\/)(?:x-)?javascript/ => \&event_attr_handler, }, ); sub script_handler { my($mech, $dom_tree, $code, $url, $line, $is_inline) = @_; # ... code to run the script ... } sub event_attr_handler { my($mech, $elem, $event_name, $code, $url, $line) = @_; # ... code that returns a coderef ... } $m->plugin('DOM')->tree; # DOM tree for the current page $m->plugin('DOM')->window; # Window object
This is a plugin for WWW::Mechanize that provides support for the HTML Document Object Model. This is a part of the WWW::Mechanize::Plugin::JavaScript distribution, but it can be used on its own.
To enable this plugin, use Mech's use_plugin method, as shown in the synopsis.
use_plugin
To access the DOM tree, use $mech->plugin('DOM')->tree, which returns an HTML::DOM object.
$mech->plugin('DOM')->tree
You may provide a subroutine that runs an inline script like this:
$mech->use_plugin('DOM', script_handlers => { qr/.../ => sub { ... }, qr/.../ => sub { ... }, # etc } );
And a subroutine for turning HTML event attributes into subroutines, like this:
$mech->use_plugin('DOM', event_attr_handlers => { qr/.../ => sub { ... }, qr/.../ => sub { ... }, # etc } );
In both cases, the qr/.../ should be a regular expression that matches the scripting language to which the handler applies, or the string 'default'. The scripting language will be either a MIME type or the contents of the language attribute if a script element's type attribute is not present. The subroutine specified as the 'default' will be used if there is no handler for the scripting language in question or if there is no Content-Script-Type header and, for script_handlers, the script element has no 'type' or 'language' attribute.
qr/.../
language
type
script_handlers
Each time you move to another page with WWW::Mechanize, a different copy of the DOM plugin object is created. So, if you must refer to it in a callback routine, don't use a closure, but get it from the $mech object that is passed as the first argument.
$mech
The line number passed to an event attribute handler requires HTML::DOM 0.012 or higher. It will be undef will lower versions.
undef
This is the usual boring list of methods. Those that are described above are listed here without descriptions.
This returns the window object.
This returns the DOM tree (aka the document object).
This evaluates the code associated with each timeout registered with the window's setTimeout function, if the appropriate interval has elapsed.
setTimeout
This returns a boolean indicating whether scripts are enabled. It is true by default. You can disable scripts by passing a false value.
Bug: This does not disable event handlers that are already registered.
Currently the (on)load event is triggered when the page finishes parsing. This plugin assumes that you're not going to be loading any images, etc.
%Interface
If you are creating your own script binding, you'll probably want to access the hash named %WWW::Mechanize::Plugin::DOM::Interface, which lists, in a machine-readable format, the interface members of the location and navigator objects. It follows the same format as %HTML::DOM::Interface.
%WWW::Mechanize::Plugin::DOM::Interface
See also "THE %Interface HASH" in WWW::Mechanize::Plugin::DOM::Window for a list of members of the window object.
HTML::DOM 0.019 or later
WWW::Mechanize
The current stable release of WWW::Mechanize does not support plugins. See WWW::Mechanize::Plugin::JavaScript for more info.
constant::lexical
Hash::Util::FieldHash::Compat
The onunload event is not yet supported. The window object is not yet part of the event dispatch chain. Some events do not yet do everything they are supposed to; e.g., a link's click method does not go to the next page.
click
This plugin does not yet provide WWW::Mechanize with all the necessary callback routines (for extract_images, etc.).
extract_images
Currently, external scripts referenced within a page are always read as Latin-1. This will be fixed.
The location object's replace method does not currently work correctly if the current page is the first page. In that case it acts like an assignment to href.
replace
href
Disabling scripts does not currently affect event handlers that are already registered.
The window method dies if the page is not HTML.
window
Copyright (C) 2007 Father Chrysostomos <join '@', sprout => join '.', reverse org => 'cpan'>
join '@', sprout => join '.', reverse org => 'cpan'
This program is free software; you may redistribute it and/or modify it under the same terms as perl.
WWW::Mechanize::Plugin::DOM::Window
WWW::Mechanize::Plugin::DOM::Location
WWW::Mechanize::Plugin::JavaScript
HTML::DOM
2 POD Errors
The following errors were encountered while parsing the POD:
'=item' outside of any '=over'
You forgot a '=back' before '=head1'
To install WWW::Mechanize::Plugin::JavaScript, copy and paste the appropriate command in to your terminal.
cpanm
cpanm WWW::Mechanize::Plugin::JavaScript
CPAN shell
perl -MCPAN -e shell install WWW::Mechanize::Plugin::JavaScript
For more information on module installation, please visit the detailed CPAN module installation guide.