NAME
WWW::Crawler::Mojo::ScraperUtil - Scraper utitlities
SYNOPSIS
DESCRIPTION
This class inherits Mojo::UserAgent and override start method for storing user info
ATTRIBUTES
WWW::Crawler::Mojo::ScraperUtil implements following attributes.
METHODS
WWW::Crawler::Mojo::ScraperUtil implements following methods.
collect_urls_css
Collects URLs out of CSS.
@urls
= collect_urls_css(
$dom
);
decoded_body
Returns decoded response body for given Mojo::Message::Request using guess_encoding and encoder.
encoder
Generates Encode instance for given name. Defaults to Encode::utf8.
html_handler_presets
Returns common html handler in hash reference.
my
$handlers
= html_handlers();
reduce_html_handlers
Narrows html handler selectors by prefixing container CSS snippets.
my
$handlers
= html_handlers(
$handlers
, [
'#header'
,
'#footer li'
]);
$handlers
->{img} =
sub
{
my
$dom
=
shift
;
return
$dom
->{src};
};
my
@urls
;
for
my
$selector
(
sort
keys
%{
$handlers
}) {
$dom
->find(
$selector
)->
each
(
sub
{
push
(
@urls
,
$handlers
->{
$selector
}->(
shift
));
})->to_array;
}
resolve_href
Resolves URLs with a base URL.
WWW::Crawler::Mojo::resolve_href(
$base
,
$uri
);
guess_encoding
Guesses encoding of HTML or CSS with given Mojo::Message::Response instance.
$encode
= WWW::Crawler::Mojo::guess_encoding(
$res
) ||
'utf-8'
AUTHOR
Keita Sugama, <sugama@jamadam.com>
COPYRIGHT AND LICENSE
Copyright (C) Keita Sugama.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.