The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

webreaper -- download a page page and its links

SYNOPSIS

        webreaper URL

DESCRIPTION

THIS IS ALPHA SOFTWARE

The webreaper program downloads web sites. It creates a directory, named after the host of the URL given on the command line, in the current working directory.

Command line switches

-r --- referer for the first URL
-u --- username for basic auth
-p --- password for basic auth
-v --- verbose ouput

FEATURES SO FAR

limits itself to the starting domain

WISH LIST

limit directory level
limit content types, file names
specify a set of patterns to ignore
do conditional GETs
Tk or curses interface?
create an error log, report, or something
download stats (clock time, storage space, etc)
multiple levels of verbosity for output
read items from a config file
allow user to add/delete allowed domains during runtime
specify directory where to save downloads
optional sleep time between requests
ensure that path names are safe (i.e. no ..)

SOURCE AVAILABILITY

This source is part of a SourceForge project which always has the latest sources in CVS, as well as all of the previous releases.

        https://sourceforge.net/projects/brian-d-foy/

If, for some reason, I disappear from the world, one of the other members of the project can shepherd this module appropriately.

AUTHOR

brian d foy, <bdfoy@cpan.org>

COPYRIGHT

Copyright 2003, brian d foy, All rights reserved.

You may use this program under the same terms as Perl itself.