The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Novel::Robot::Parser - get novel / bbs content from website

小说站点解析引擎

INIT

site

support novel website 支持小说站点

asxs 爱尚

day66 天天小说

dddbbb 豆豆

dingdian 顶点

hkslg 顺隆书院

jjwxc 绿晋江

kanshu 要看书

kanshuge 看书阁

kanunu 努努

luoqiu 落秋

my285 梦远

qidian 起点

qqxs 千千

shunong 书农

snwx 少年文学

tadu 塔读文学

ttzw 天天中文

yanqingji 言情记

ybdu 一本读

yqhhy 言情后花园

zhonghuawuxia 中华武侠

zilang 紫琅文学

support txt file 支持处理txt

txt 指定解析txt文件

support raw file 支持处理序列化数据

raw 指定解析经MessagePack压缩的二进制文件

support forum website 支持论坛站点

hjj 红晋江

tieba 百度贴吧

xvna 炫浪网络

new

init funtion, need set sitename,or url 初始化解析模块,需指定站点名称或网址

   #sitename : 直接指定站点
   my $parser = Novel::Robot::Parser->new( site => 'jjwxc' );
    
   #url : 通过url自动检测站点
   my $url = 'http://www.jjwxc.net/onebook.php?novelid=2456';
   my $parser = Novel::Robot::Parser->new( site => $url );

get_item_ref

get novel / forum thread data hash ref

获取小说/贴子内容,返回一个hash引用

   my $r = $parser->get_item_ref($url, %opt);

NOVEL FUNCTION

get_novel_ref

get novel data, 获取小说内容

   my $r = $parser->get_novel_ref($url, %opt);

get_index_ref

get novel index data, 获取目录页信息

   my $index_ref = $parser->get_index_ref($index_url, %opt);

parse_index

parse novel index html content, 解析目录页

   my $index_ref = $parser->parse_index($index_html_ref);

get_chapter_ref

get novel chapter data, 获取章节页信息

    my $chapter_url = 'http://m.jjwxc.net/book2/2456/2';
    my $chapter_ref = $parser->get_chapter_ref($chapter_url, 2);

parse_chapter

parse novel chapter html content, 解析章节页

   my $chapter_ref = $parser->parse_chapter($chapter_html_ref);

TIEZI FUNCTION

get_tiezi_ref

get forum thread data, 获取贴子内容

   my $r = $parser->get_tiezi_ref($url, %opt);

parse_tiezi

parse forum thread html content, 解析帖子信息

   my $tz_ref = $parser->parse_tiezi($tz_html_ref);

parse_tiezi_floors

parse forum thread html floor content, 解析贴子楼层

   my $floors = $parser->parse_tiezi_floors($tz_html_ref);

parse_tiezi_urls

get forum thread pages, 获取帖子分页

   my $urls = $parser->parse_tiezi_urls($tz_html_ref);

BOARD FUNCTION

writer -> multi books

forum board -> multi threads

get_board_ref

get writer / board info, 获取版块信息

   my $r = $parser->get_board_ref($url, %opt);

parse_board

parse writer / fourm board info,解析作者专栏/版块信息

   my $board_ref = $parser->parse_board($board_html_ref);

parse_board_tiezis

parse board thread urls, 解析版块内容url

   my $tzs = $parser->parse_board_items($board_html_ref);

parse_board_urls

parse board pages, 解析版块分页url

   my $urls = $parser->parse_board_urls($board_html_ref);

parse_board_subboards

parse forum subboards, 获取子版块url

   my $subboards = $parser->parse_board_subboards($board_html_ref);

QUERY FUNCTION

get_query_ref

query info, 获取查询结果

    my $query_type = '作者';
    my $query_keyword = '顾漫';

    my ($info, $items_ref) = $parser->get_query_ref( $query_keyword, 
        query_type => $query_type );

make_query_request 指定查询请求

make query http data,查询请求数据

  my ($query_url, $post_data) = 
        $parser->make_query_request( $query_keyword, 
        query_type => $query_type );

parse_query

parse query html,解析查询结果

  my $query_title = $parser->parse_query($query_html_ref); 

parse_query_items

parse query result, for examle, novel/thread url,解析查询结果列表,例如小说url

  my $items_ref = $parser->parse_query_items($query_html_ref); 

parse_query_urls

parse query pages, 查询结果为分页url

  my $urls_ref = $parser->parse_query_urls($query_html_ref);