The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Gungho - Yet Another High Performance Web Crawler Framework

SYNOPSIS

  use Gungho;
  my $g = Gungho->new($config);
  $g->run;

DESCRIPTION

Gungho is Yet Another Web Crawler Framework, aimed to be an extensible and fast. Its meant to be a culmination of lessons learned while building Xango -- Xango was *fast*, but it was horribly hard to debug. Gungho tries to build from clean structures, based upon principles from the likes of Catalyst and Plagger.

WARNING: *ALL* APIs are still subject to change.

STRUCTURE

Gungho is comprised of three parts. A Provider, which provides Gungho with requests to process, a Handler, which handles the fetched page, and an Engine, which controls the entire process.

METHODS

new($config)

Creates a new Gungho instance. It requires either the name of a config filename or a hashref.

run

Starts the Gungho process.

setup

Sets up the Gungho environment, including calling the various setup_* methods to configure the provider, engine, handler, etc.

setup_engine

setup_handler

setup_log

setup_provider

Sets up the various components.

has_requests

Delegates to provider's has_requests

get_requests

Delegates to provider's get_requests

handle_response

Delegates to handler's handle_response

load_config($config)

Loads the config from $config via Config::Any.

load_gungho_module($name, $prefix)

Loads a Gungho component. Compliments the module name with 'Gungho::$prefix::', unless the name is prefixed with a '+'. In that case, no transformation is performed, and the module name is used as-is.

CODE

You can obtain the current code base from

  http://gungho-crawler.googlecode.com/svn/trunk

AUTHOR

Copyright (c) 2007 Daisuke Maki <daisuke@endeworks.jp>

All rights reserved.

LICENSE

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

See http://www.perl.com/perl/misc/Artistic.html