The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

WWW::Crawler::Mojo::ScraperUtil - Scraper utitlities

SYNOPSIS

DESCRIPTION

This class inherits Mojo::UserAgent and override start method for storing user info

ATTRIBUTES

WWW::Crawler::Mojo::ScraperUtil implements following attributes.

METHODS

WWW::Crawler::Mojo::ScraperUtil implements following methods.

collect_urls_css

Collects URLs out of CSS.

    @urls = collect_urls_css($dom);

decoded_body

Returns decoded response body for given Mojo::Message::Request using guess_encoding and encoder.

encoder

Generates Encode instance for given name. Defaults to Encode::utf8.

html_handlers

HTML element handler presets on scraping. Optional argument narrows the preset selector into certain containers.

    my $handlers = html_handlers(['#header', '#footer li']);
    
    $handlers->{img} = sub {
        my $dom = shift;
        return $dom->{src};
    };
    
    my @urls;
    for my $selector (sort keys %{$handlers}) {
        $dom->find($selector)->each(sub {
            push(@urls, $handlers->{$selector}->(shift));
        })->to_array;
    }

resolve_href

Resolves URLs with a base URL.

    WWW::Crawler::Mojo::resolve_href($base, $uri);

guess_encoding

Guesses encoding of HTML or CSS with given Mojo::Message::Response instance.

    $encode = WWW::Crawler::Mojo::guess_encoding($res) || 'utf-8'

AUTHOR

Keita Sugama, <sugama@jamadam.com>

COPYRIGHT AND LICENSE

Copyright (C) Keita Sugama.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.