Gungho - Yet Another High Performance Web Crawler Framework
use Gungho; my $g = Gungho->new($config); $g->run;
Gungho is Yet Another Web Crawler Framework, aimed to be an extensible and fast. Its meant to be a culmination of lessons learned while building Xango -- Xango was *fast*, but it was horribly hard to debug. Gungho tries to build from clean structures, based upon principles from the likes of Catalyst and Plagger.
All components (engine, provider, handler) are overridable and switcheable. Plugin mechanism is available to add hooks to be executed during the run.
WARNING: *ALL* APIs are still subject to change.
Gungho is comprised of three parts. A Provider, which provides Gungho with requests to process, a Handler, which handles the fetched page, and an Engine, which controls the entire process.
There are also "hooks". These hooks can be registered from anywhere by invoking the register_hook() method. They are run at particular points, which are specified when you call register_hook().
Currently available hooks are:
Creates a new Gungho instance. It requires either the name of a config filename or a hashref.
Starts the Gungho process.
Returns true if Gungho supports some feature $name
Sets up the Gungho environment, including calling the various setup_* methods to configure the provider, engine, handler, etc.
Sets up the various components.
Registers a hook to be run under the specified $hook_name
Runs all the hooks under the hook $hook_name
Delegates to provider's has_requests
Delegates to provider's get_requests
Delegates to handler's handle_response
Calls provider->dispatch
Given a request, preps it before sending it to the engine
Delegates to engine's send_request
Loads the config from $config via Config::Any.
Loads a Gungho component. Compliments the module name with 'Gungho::$prefix::', unless the name is prefixed with a '+'. In that case, no transformation is performed, and the module name is used as-is.
You can obtain the current code base from
http://gungho-crawler.googlecode.com/svn/trunk
Copyright (c) 2007 Daisuke Maki <daisuke@endeworks.jp>
All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
See http://www.perl.com/perl/misc/Artistic.html
To install Gungho, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Gungho
CPAN shell
perl -MCPAN -e shell install Gungho
For more information on module installation, please visit the detailed CPAN module installation guide.