WWW::Mechanize::Firefox - use Firefox as if it were WWW::Mechanize
use WWW::Mechanize::Firefox; my $mech = WWW::Mechanize::Firefox->new(); $mech->get('http://google.com'); $mech->eval_in_page('alert("Hello Firefox")'); my $png = $mech->content_as_png();
This will let you automate Firefox through the Mozrepl plugin, which you need to have installed in your Firefox.
$mech->new( ARGS )
Creates a new instance and connects it to Firefox.
Note that Firefox already must be running and must have the mozrepl extension installed.
mozrepl
The following options are recognized:
tab - regex for the title of the tab to reuse. If no matching tab is found, the constructor dies.
tab
launch - name of the program to launch if we can't connect to it on the first try.
launch
log - array reference to log levels, passed through to MozRepl::RemoteObject
log
events - the set of default Javascript events to listen for while waiting for a reply
events
repl - a premade MozRepl::RemoteObject instance
repl
pre_events - the events that are sent to an input field before its value is changed. By default this is [focus].
pre_events
[focus]
post_events - the events that are sent to an input field after its value is changed. By default this is [blur, change].
post_events
[blur, change]
This will launch Firefox if the program can't connect to the mozrepl plugin in Firefox. This will also enable mozrepl in a Firefox process if it is not already running.
my $mech = WWW::Mechanize::Firefox->new( launch => 'firefox', );
$mech->allow OPTIONS
Enables or disables browser features for the current tab. The following options are recognized:
plugins - Whether to allow plugin execution.
plugins
javascript - Whether to allow Javascript execution.
javascript
metaredirects - Attribute stating if refresh based redirects can be allowed.
metaredirects
frames, subframes - Attribute stating if it should allow subframes (framesets/iframes) or not.
frames
subframes
images - Attribute stating whether or not images should be loaded.
images
Options not listed remain unchanged.
$mech->allow( javascript => 0 );
$mech->js_errors [PAGE]
An interface to the Javascript Error Console
Returns the list of errors in the JEC
$mech->get('mypage'); my @errors = $mech->js_errors(); if (@errors) { die "Found errors on page: @errors"; };
Maybbe this should be called js_messages or js_console_messages instead.
js_messages
js_console_messages
$mech->clear_js_errors
Clears all Javascript messages from the console
$mech->eval_in_page STR [, ENV]
Evaluates the given Javascript fragment in the context of the web page. Returns a pair of value and Javascript type.
This allows access to variables and functions declared "globally" on the web page.
The returned result needs to be treated with extreme care because it might lead to Javascript execution in the context of your application instead of the context of the webpage. This should be evident for functions and complex data structures like objects. When working with results from untrusted sources, you can only safely use simple types like string.
string
If you want to modify the environment the code is run under, pass in a hash reference as the second parameter. All keys will be inserted into the this object as well as this.window. Also, complex data structures are only supported if they contain no objects. If you need finer control, you'll have to write the Javascript yourself.
this
this.window
This method is special to WWW::Mechanize::Firefox.
Also, using this method opens a potential security risk as the returned values can be objects and using these objects can execute malicious code in the context of the Firefox application.
alert()
$mech->eval_in_page('alert("Hello");', { alert => sub { print "Captured alert: '@_'\n" } } );
$mech->unsafe_page_property_access ELEMENT
Allows you unsafe access to properties of the current page. Using such properties is an incredibly bad idea.
This is why the function dies. If you really want to use this function, edit the source code.
die
$mech->addTab( OPTIONS )
Creates a new tab. The tab will be automatically closed upon program exit.
If you want the tab to remain open, pass a false value to the the autoclose option.
autoclose
$mech->tab
Gets the object that represents the Firefox tab used by WWW::Mechanize::Firefox.
$mech->progress_listener SOURCE, CALLBACKS
Sets up the callbacks for the nsIWebProgressListener interface to be the Perl subroutines you pass in.
nsIWebProgressListener
Returns a handle. Once the handle gets released, all callbacks will get stopped. Also, all Perl callbacks will get deregistered from the Javascript bridge, so make sure not to use the same callback in different progress listeners at the same time.
my $browser = $mech->repl->expr('window.getBrowser()'); my $eventlistener = progress_listener( $browser, onLocationChange => \&onLocationChange, ); while (1) { $mech->repl->poll(); sleep 1; };
$mech->repl
Gets the MozRepl::RemoteObject instance that is used.
$mech->events
Sets or gets the set of Javascript events that WWW::Mechanize::Firefox will wait for after requesting a new page. Returns an array reference.
$mech->cookies
Returns a HTTP::Cookies object that was initialized from the live Firefox instance.
Note: ->set_cookie is not yet implemented, as is saving the cookie jar.
->set_cookie
$mech->highlight_node NODES
Convenience method that marks all nodes in the arguments with
background: red; border: solid black 1px; display: block; /* if the element was display: none before */
This is convenient if you need visual verification that you've got the right nodes.
There currently is no way to restore the nodes to their original visual state except reloading the page.
$mech->get(URL)
Retrieves the URL URL into the tab.
URL
It returns a faked HTTP::Response object for interface compatibility with WWW::Mechanize. It does not yet support the additional parameters that WWW::Mechanize supports for saving a file etc.
$mech->get_local $filename
Shorthand method to construct the appropriate file:// URI and load it into Firefox.
file://
This method is special to WWW::Mechanize::Firefox but could also exist in WWW::Mechanize through a plugin.
$mech->synchronize( $event, $callback )
Wraps a synchronization semaphore around the callback and waits until the event $event fires on the browser. If you want to wait for one of multiple events to occur, pass an array reference as the first parameter.
$event
Usually, you want to use it like this:
my $l = $mech->xpath('//a[@onclick]', single => 1); $mech->synchronize('DOMFrameContentLoaded', sub { $l->__click() });
It is necessary to synchronize with the browser whenever a click performs an action that takes longer and fires an event on the browser object.
The DOMFrameContentLoaded event is fired by Firefox when the whole DOM and all iframes have been loaded. If your document doesn't have frames, use the DOMContentLoaded event instead.
DOMFrameContentLoaded
iframe
DOMContentLoaded
If you leave out $event, the value of ->events() will be used instead.
->events()
$mech->res
$mech->response
Returns the current response as a HTTP::Response object.
$mech->success
Returns a boolean telling whether the last request was successful. If there hasn't been an operation yet, returns false.
This is a convenience function that wraps $mech->res->is_success.
$mech->res->is_success
$mech->status
Returns the HTTP status code of the response. This is a 3-digit number like 200 for OK, 404 for not found, and so on.
$mech->reload BYPASS_CACHE
Reloads the current page. If BYPASS_CACHE is a true value, the browser is not allowed to use a cached page. This is the difference between pressing F5 (cached) and shift-F5 (uncached).
BYPASS_CACHE
F5
shift-F5
Returns the (new) response.
$mech->back
Goes one page back in the page history.
$mech->forward
$mech->uri
Returns the current document URI.
$mech->document
Returns the DOM document object.
This is WWW::Mechanize::Firefox specific.
$mech->docshell
Returns the docShell Javascript object.
docShell
$mech->content
Returns the current content of the tab as a scalar.
This is likely not binary-safe.
It also currently only works for HTML pages.
$mech->update_html $html
Writes $html into the current document. This is mostly implemented as a convenience method for HTML::Display::MozRepl.
$html
$mech->save_content $localname [, $resource_directory] [, %OPTIONS ]
Saves the given URL to the given filename. The URL will be fetched from the cache if possible, avoiding unnecessary network traffic.
If $resource_directory is given, the whole page will be saved. All CSS, subframes and images will be saved into that directory, while the page HTML itself will still be saved in the file pointed to by $localname.
$resource_directory
$localname
Returns a <nsIWebBrowserPersist> object through which you can cancel the download by calling its ->cancelSave method. Also, you can poll the download status through the ->{currentState} property.
<nsIWebBrowserPersist
->cancelSave
->{currentState}
If you are interested in the intermediate download progress, create a ProgressListener through $mech->progress_listener and pass it in the progress option.
$mech->progress_listener
progress
The download will continue in the background. It will not show up in the Download Manager.
$mech->save_url $url, $localname, [%OPTIONS]
The download will continue in the background. It will also not show up in the Download Manager.
ftp
You can use ->save_url to transfer files. $localname can be a local filename, a file:// URL or any other URL that allows uploads, like ftp://.
->save_url
ftp://
$mech->save_url('file://path/to/my/file.txt' => 'ftp://myserver.example/my/file.txt');
Not implemented - this requires instantiating and passing a nsIURI object instead of a nsILocalFile .
nsIURI
nsILocalFile
$mech->base
Returns the URL base for the current page.
The base is either specified through a base tag or is the current URL.
base
This method is specific to WWW::Mechanize::Firefox
$mech->content_type
Returns the content type of the currently loaded document
$mech->is_html()
Returns true/false on whether our content is HTML, according to the HTTP headers.
$mech->title
Returns the current document title.
$mech->links
Returns all links in the document.
Currently accepts no parameters.
$mech->find_link_dom OPTIONS
A method to find links, like WWW::Mechanize's ->find_links method.
->find_links
Returns the DOM object as MozRepl::RemoteObject::Instance.
The supported options are:
text - the text of the link
text
id - the id attribute of the link
id
name - the name attribute of the link
name
url - the URL attribute of the link (href, src or content).
url
href
src
content
class - the class attribute of the link
class
n - the (1-based) index. Defaults to returning the first link.
n
single - If true, ensure that only one element is found. Otherwise croak or carp, depending on the autodie parameter.
single
autodie
one - If true, ensure that at least one element is found. Otherwise croak or carp, depending on the autodie parameter.
one
The method croaks if no link is found. If the single option is true, it also croaks when more than one link is found.
croak
$mech->find_link OPTIONS
A method quite similar to WWW::Mechanize's method.
Returns a WWW::Mechanize::Link object.
$mech->find_all_links OPTIONS
Finds all links in the document.
Returns them as list or an array reference, depending on context.
$mech->find_all_links_dom OPTIONS
Finds all matching linky DOM nodes in the document.
$mech->click NAME [,X,Y]
Has the effect of clicking a button on the current form. The first argument is the name of the button to be clicked. The second and third arguments (optional) allow you to specify the (x,y) coordinates of the click.
If there is only one button on the form, $mech->click() with no arguments simply clicks that one button.
If you pass in a hash reference instead of a name, the following keys are recognized:
selector - Find the element to click by the CSS selector
selector
xpath - Find the element to click by the XPath query
xpath
synchronize - Synchronize the click (default is 1)
synchronize
Returns a HTTP::Response object.
As a deviation from the WWW::Mechanize API, you can also pass a hash reference as the first parameter. In it, you can specify the parameters to search much like for the find_link calls.
find_link
$mech->follow_link
Follows the given link. Takes the same parameters that find_link uses.
$mech->current_form
Returns the current form.
This method is incompatible with WWW::Mechanize. It returns the DOM <form> object and not a HTML::Form instance.
<form>
$mech->form_name NAME [, OPTIONS]
Selects the current form by its name.
$mech->form_id ID [, OPTIONS]
Selects the current form by its id attribute.
This is equivalent to calling
$mech->selector("#$name",single => 1,%options)
$mech->form_number NUMBER [, OPTIONS]
Selects the numberth form.
$mech->form_with_fields [$OPTIONS], FIELDS
Find the form which has the listed fields.
If the first argument is a hash reference, it's taken as options to ->xpath
->xpath
$mech->forms OPTIONS
When called in a list context, returns a list of the forms found in the last fetched page. In a scalar context, returns a reference to an array with those forms.
The returned elements are the DOM <form> elements.
$mech->value NAME [, VALUE] [,PRE EVENTS] [,POST EVENTS]
Sets the field with the name to the given value. Returns the value.
Note that this uses the name attribute of the HTML, not the id attribute.
By passing the array reference PRE EVENTS, you can indicate which Javascript events you want to be triggered before setting the value. POST EVENTS contains the evens you want to be triggered after setting the value.
PRE EVENTS
POST EVENTS
By default, the events set in the constructor for pre_events and post_events are triggered.
$mech->value( 'myfield', 'myvalue', [], [] );
$mech->set_visible @values
This method sets fields of the current form without having to know their names. So if you have a login screen that wants a username and password, you do not have to fetch the form and inspect the source (or use the mech-dump utility, installed with WWW::Mechanize) to see what the field names are; you can just say
mech-dump
$mech->set_visible( $username, $password );
and the first and second fields will be set accordingly. The method is called set_visible because it acts only on visible fields; hidden form inputs are not considered.
The specifiers that are possible in WWW::Mechanize are not yet supported.
$mech->clickables
Returns all clickable elements, that is, all elements with an onclick attribute.
onclick
$mech->xpath QUERY, %options
Runs an XPath query in Firefox against the current document.
The options allow the following keys:
document - document in which the code is to be executed. Use this to search a node within a subframe of $mech->document.
document
node - node relative to which the code is to be executed
node
Returns the matched nodes.
This is a method that is not implemented in WWW::Mechanize.
In the long run, this should go into a general plugin for WWW::Mechanize.
$mech->selector css_selector, %options
Returns all nodes matching the given CSS selector.
$mech->content_as_png [TAB, COORDINATES]
Returns the given tab or the current page rendered as PNG image.
This is specific to WWW::Mechanize::Firefox.
Currently, the data transfer between Firefox and Perl is done Base64-encoded. It would be beneficial to find what's necessary to make JSON handle binary data more gracefully.
If the coordinates are given, that rectangle will be cut out. The coordinates should be a hash with the four usual entries, left,top,width,height.
left
top
width
height
my $rect = { left => 0, top => 0, width => 200, height => 200, }; my $png = $mech->content_as_png(undef, $rect); open my $fh, '>', 'page.png' or die "Couldn't save to 'page.png': $!"; binmode $fh; print {$fh} $png; close $fh;
$mech->element_as_png $element
Returns PNG image data for a single element
$mech->element_coordinates $element
Returns the page-coordinates of the $element in pixels as a hash with four entries, left, top, width and height.
$element
This function might get moved into another module more geared towards rendering HTML.
Firefox cookies will be read through HTTP::Cookies::MozRepl. This is relatively slow currently.
As this module is in a very early stage of development, there are many incompatibilities. The main thing is that only the most needed WWW::Mechanize methods have been implemented by me so far.
In Firefox, the name attribute of links seems always to be present on links, even if it's empty. This is in difference to WWW::Mechanize, where the name attribute can be undef.
undef
->form_with_fields needs tests
->form_with_fields
->find_all_inputs
This function is likely best implemented through $mech->selector.
$mech->selector
->find_all_submits
->images
->find_image
->find_all_images
->field
->select
->set_fields
This is basically a loop over $mech->value.
$mech->value
->tick
->untick
->submit
These functions are unlikely to be implemented because they make little sense in the context of Firefox.
->add_header
->delete_header
->clone
->credentials( $username, $password )
->get_basic_credentials( $realm, $uri, $isproxy )
->clear_credentials()
->put
I have no use for it
->post
Implement download progress via nsIWebBrowserPersist.progressListener and our own nsIWebProgressListener.
nsIWebBrowserPersist.progressListener
Make ->click use ->click_with_options
->click
->click_with_options
Make ->selector and ->xpath work across subframes.
->selector
Implement "reuse tab if exists, otherwise create new"
Rip out parts of Test::HTML::Content and graft them onto the links() and find_link() methods here. Firefox is a conveniently unified XPath engine.
links()
find_link()
Preferrably, there should be a common API between the two.
Spin off XPath queries (->xpath) and CSS selectors (->selector) into their own Mechanize plugin(s).
The MozRepl Firefox plugin at http://wiki.github.com/bard/mozrepl
WWW::Mechanize - the module whose API grandfathered this module
https://developer.mozilla.org/En/FUEL/Window for JS events relating to tabs
https://developer.mozilla.org/en/Code_snippets/Tabbed_browser#Reusing_tabs for more tab info
The public repository of this module is http://github.com/Corion/www-mechanize-Firefox.
Max Maischein corion@cpan.org
corion@cpan.org
Copyright 2009 by Max Maischein corion@cpan.org.
This module is released under the same terms as Perl itself.
To install WWW::Mechanize::Firefox, copy and paste the appropriate command in to your terminal.
cpanm
cpanm WWW::Mechanize::Firefox
CPAN shell
perl -MCPAN -e shell install WWW::Mechanize::Firefox
For more information on module installation, please visit the detailed CPAN module installation guide.