The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

ff-scrape.pl - simple Firefox HTML scraping from the command line

SYNOPSIS

  ff-scrape.pl URL selector selector ...

  # Print page title
  ff-scrape.pl http://perl.org title
  # The Perl Programming Language - www.perl.org

  # Print links with titles on tab CPAN, make links absolute
  ff-scrape.pl --tab CPAN a //a/@href --uri=2
  
  # Print all links to JPG images on current page, make links absolute
  ff-scrape.pl --current //a[@href=$"jpg"]/@href

Options: --tab title of tab to scrape (instead of URL) --current use currently active tab (instead of URL) --sep separator for the output columns, default is tab-separated --uri force absolute URIs for colum number x --no-uri force verbatim output for colum number x --mozrepl connection string to Firefox

OPTIONS

--tab

Name of the tab to scrape. A substring is enough.

--sep

Separator character to use for columns. Default is tab.

--uri COLUMNS

Numbers of columns to convert into absolute URIs, if the known attributes do not everything you want.

--no-uri

Switches off the automatic translation to absolute URIs for known attributes like href and src.

--mozrepl

Connection information for the mozrepl instance to use.

DESCRIPTION

This program fetches an HTML page and extracts nodes matched by XPath or CSS selectors from it.

SEE ALSO

https://github.com/Corion/App-scrape - App::scrape

A similar program without the need for Javascript.

Mojolicious - also includes a CSS / Xpath scraper

REPOSITORY

The public repository of this module is http://github.com/Corion/www-mechanize-firefox.

SUPPORT

The public support forum of this program is http://perlmonks.org/.

AUTHOR

Max Maischein corion@cpan.org

COPYRIGHT (c)

Copyright 2011-2011 by Max Maischein corion@cpan.org.

LICENSE

This module is released under the same terms as Perl itself.