The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

Changes for version 0.09005_03

  • General
    • Implement a $c->shutdown, and Engnie/Provider/Handler->stop that will stop the entire system
    • Ahem, *do* index the Japanese documentation (but change the names)
  • Provider::Inline
    • Backwards Incompatible Change *
      • Provider::Inline will no longer dispatch your requests merely by placing them in $p->requests. You need to call send_request() yourself
  • RobotRules
    • Deprecate usage of "module: RobotoRules::Storage::XXXX". Now you don't have to type that much. Just say "module: XXXX". This will break your old code! Beware!
  • Apoptosis
    • Call shutdown() instead of setting is_running

Changes for version 0.09005_02

  • General
    • Tweak deps
    • Don't index Japanese documentation
    • Fix 02_config.t to check for contents rather than entire structure. Seems like some YAML versions reads in the '---' in the beginning of the YAML document
    • Add Gungho::Base::mk_virtual_methods()
    • Fix a bunch of typos in Japanese docs
  • RobotRules::Storage
    • Explicitly state methods that should be virtual methods.

Changes for version 0.09005_01

  • General
    • Migrate hooks to Event::Notify. This breaks the input parameter list. Now you receive the event name as the first argument
    • Add a TODO.pod
  • Engine::POE
    • Implement a shutdown state.
    • Make it callable from stop()
  • Engine
    • Refactor handle_response to Engine.pm
  • Throttle
    • Changed send_request() to return 1 on success, 0 otherwise.
  • RobotRules
    • DB_File storage now dies if the call to tie() fails
  • Log::Dispatch
    • Clarify in document that log config should be specified with "config" key.
    • Backport changes from ja docs

Documentation

高性能Webクローラーフレームワーク
Gunghoコンポーネント親クラス
Gungho認証親クラス
GunghoでBasic認証を行う
内部IPアドレスに解決するリクエストを拒否する
キャッシュ機能を組み込む
robots.txtの処理を行う
robots.txtストレージ
robots.txt情報をDB_Fileに格納する
ページ内のRobotsMETAをパースする
スロットリング用ベースクラス
ドメイン/ホスト毎にスロットリング
リクエスト総数でスロットル
Gungho用POEエンジン
Gunghoログモジュール
Gunghoの基本
インストール
高性能クローラーフレームワーク
Gunghoチュートリアル
プロセスが停止する時間を指定
リクエスト履歴をログする
Gunghoリクエストオブジェクト
Gunghoレスポンス オブジェクト
Gungho FAQ
TODO Items
An Extensible, High-Performance Web Crawler Framework

Modules

Yet Another High Performance Web Crawler Framework
Base Class For Various Gungho Objects
Base For Classes That Won't Be Instantiated
Component Base Class For Gungho
Base Class For WWW Authentication
Add Basic Auth To Gungho
Block Requests With Private IP Address
Use Cache In Your App
Gungho Core Methods
Respect robots.txt
RobotRules Storage Base Class
Cache Storage For RobotRules
DB_File Storage For RobotRules
Automatically Parse Robots META
Web::Scraper From Within Gungho
Routines To Setup Gungho
Base Class To Throttle Requests
Throttle By Number Of Requests
Data::Throttler Based Throttling
Base Class For Gungho Engine
Gungho Engine Using Danga::Socket
IO::Async Engine
POE Engine For Gungho
Gungho Exceptions
Base Class For Gungho Handlers
Write Out Fetched Contents To File
Inline Handler
A Handler That Does Nothing
Inline Your Providers And Handlers (Deprecated)
Log Base Class For Gungho
Log::Dispatch-Based Log For Gungho
Simple Gungho Log Class
Gungho Plugin Base Class
Stop Execution In Long-Running Processes
Keep Track Of Time To Finish Request
Gather Crawler Statistics
Format Statistics As XML
Base Class For Gungho Prividers
Provide Requests From A Simple File
Inline Provider
An In-Memory, Simple Provider
Specify requests in YAML format
A Gungho Request Object
HTTP specific utilities
Gungho HTTP Response Object
Gungho General Utilities

Provides

in lib/Gungho/Engine/IO/Async.pm
in lib/Gungho/Plugin/Apoptosis.pm