NAME
ff-scrape.pl - simple Firefox HTML scraping from the command line
SYNOPSIS
ff-scrape.pl URL selector selector ...
# Print page title
ff-scrape.pl http://perl.org title
# The Perl Programming Language - www.perl.org
# Print links with titles on tab CPAN, make links absolute
ff-scrape.pl --tab CPAN a //a/@href --uri=2
# Print all links to JPG images on current page, make links absolute
ff-scrape.pl --current //a[@href=$"jpg"]/@href
Options: --tab title of tab to scrape (instead of URL) --current use currently active tab (instead of URL) --sep separator for the output columns, default is tab-separated --uri force absolute URIs for colum number x --no-uri force verbatim output for colum number x --mozrepl connection string to Firefox
OPTIONS
- --tab
-
Name of the tab to scrape. A substring is enough.
- --sep
-
Separator character to use for columns. Default is tab.
- --uri COLUMNS
-
Numbers of columns to convert into absolute URIs, if the known attributes do not everything you want.
- --no-uri
-
Switches off the automatic translation to absolute URIs for known attributes like
href
andsrc
. - --mozrepl
-
Connection information for the mozrepl instance to use.
DESCRIPTION
This program fetches an HTML page and extracts nodes matched by XPath or CSS selectors from it.
SEE ALSO
https://github.com/Corion/App-scrape - App::scrape
A similar program without the need for Javascript.
Mojolicious - also includes a CSS / Xpath scraper
REPOSITORY
The public repository of this module is http://github.com/Corion/www-mechanize-firefox.
SUPPORT
The public support forum of this program is http://perlmonks.org/.
AUTHOR
Max Maischein corion@cpan.org
COPYRIGHT (c)
Copyright 2011-2011 by Max Maischein corion@cpan.org
.
LICENSE
This module is released under the same terms as Perl itself.