ff-scrape.pl - simple Firefox HTML scraping from the command line
ff-scrape.pl URL selector selector ... # Print page title ff-scrape.pl http://perl.org title # The Perl Programming Language - www.perl.org # Print links with titles on tab CPAN, make links absolute ff-scrape.pl --tab CPAN a //a/@href --uri=2 # Print all links to JPG images on current page, make links absolute ff-scrape.pl --current //a[@href=$"jpg"]/@href
Options: --tab title of tab to scrape (instead of URL) --current use currently active tab (instead of URL) --sep separator for the output columns, default is tab-separated --uri force absolute URIs for colum number x --no-uri force verbatim output for colum number x --mozrepl connection string to Firefox
Name of the tab to scrape. A substring is enough.
Separator character to use for columns. Default is tab.
Numbers of columns to convert into absolute URIs, if the known attributes do not everything you want.
Switches off the automatic translation to absolute URIs for known attributes like href and src.
href
src
Connection information for the mozrepl instance to use.
This program fetches an HTML page and extracts nodes matched by XPath or CSS selectors from it.
https://github.com/Corion/App-scrape - App::scrape
A similar program without the need for Javascript.
Mojolicious - also includes a CSS / Xpath scraper
The public repository of this module is http://github.com/Corion/www-mechanize-firefox.
The public support forum of this program is http://perlmonks.org/.
Max Maischein corion@cpan.org
corion@cpan.org
Copyright 2011-2011 by Max Maischein corion@cpan.org.
This module is released under the same terms as Perl itself.
To install WWW::Mechanize::Firefox, copy and paste the appropriate command in to your terminal.
cpanm
cpanm WWW::Mechanize::Firefox
CPAN shell
perl -MCPAN -e shell install WWW::Mechanize::Firefox
For more information on module installation, please visit the detailed CPAN module installation guide.