NAME

WWW::2ch - scraping of a popular bbs of Japan.

SYNOPSIS

  use WWW::2ch;

  my $bbs = WWW::2ch->new(url => 'http://live19.2ch.net/ogame/',
                          cache => '/tmp/www2ch-cache');
  $bbs->load_setting;
  $bbs->load_subject;
  foreach my $dat ($bbs->subject->threads) {
      $dat->load;
      my $one = $dat->res(1);
      print $dat->title . "\n";
      print '>>1: ' . $one->body;
      foreach my $res ($dat->reslist) {
        print $res->resid . ':' . $res->date . "\n";
        print $res->body_text . "\n";
      }
      last;
  }


  my $bbs = WWW::2ch->new(url => 'http://live19.2ch.net/test/read.cgi/ogame/1140947283/l50',
                          cache => '/tmp/www2ch-cache');
  my $dat = $bbs->subject->thread('1140947283');
  $dat->load;


  # dat in cash is taken out
  my $bbs = WWW::2ch->new(url => 'http://live19.2ch.net/ogame/',
                        cache => '/home/ko/cpan/my/WWW-2ch/cache');
  my $dat = $bbs->recall_dat('1141300600');


  # parse dose dat from file
  my $bbs = WWW::2ch->new(url => 'http://live19.2ch.net/ogame/',
                        cache => '/home/ko/cpan/my/WWW-2ch/cache');
  open my $fh, "test.dat" or return;
  my $data = join('', <$fh>);
  close($fh);
  my $dat = $bbs->parse_dat($data);

  # returns it with raw article data.
  $dat->dat;

  #plugin load
  my $bbs = WWW::2ch->new(url => 'http://example.jp/test/read.cgi/ogame/1140947283/l50',
                          cache => '/tmp/www2ch-cache',
                          plugin => 'ExampleJp');

  # plugin file load
  my $bbs = WWW::2ch->new(url => 'http://example.com/test/read.cgi/ogame/1140947283/l50',
                          cache => '/tmp/www2ch-cache',
                          plugin => '/usr/local/www-2ch/lib/ExampleCom.pm');

DESCRIPTION

It is suitable for the scraping of a popular bbs of Japan.

other BBS and the news sites and other sites are also possible by the addition of the plugin for scraping.

Please take care with the flood control to an excessive access.

Method

new(%option)

option

  • url

    set the permalink of top page.

  • cache

    cache directory or Cache module object

  • plugin

    plugin name (default Base)

encoding

encode name of plugin

load_setting

setting is read

load_subject

article list is read

parse_dat($data[, $subject])

parse does $data

recall_dat($key)

recall dat from cache file

SEE ALSO

http://2ch.net/, http://www.monazilla.org/, WWW::2ch::Subject, WWW::2ch::Dat, WWW::2ch::Res

AUTHOR

Kazuhiro Osawa <ko@yappo.ne.jp>

COPYRIGHT AND LICENSE

Copyright (C) 2006 by Kazuhiro Osawa

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.5 or, at your option, any later version of Perl 5 you may have available.

2 POD Errors

The following errors were encountered while parsing the POD:

Around line 166:

You forgot a '=back' before '=head2'

Around line 184:

'=item' outside of any '=over'