NAME

HTML::Any - a common interface for HTTP clients (LWP, AnyEvent::HTTP, Curl)

SYNOPSIS

 use HTTP::Any::...
 use ...

 sub do_http {
        ...
        HTTP::Any::...
 }

 my $opt = { ... };

 my $cb = sub {
        my ($is_success, $body, $headers, $redirects) = @_;
        ...
 }

 do_http($url, $opt, $cb);

MOTIVATION

LWP, AnyEvent::HTTP, Curl - each of them has its advantages, disadvantages and peculiarities. The HTML::Any modules were created during the process of investigation of the strong and weak sides of those above-mentioned HTML clients. They allow quick switching between them to use the best one for each definite case.

DESCRIPTION

IMPORT

I recommend placing using HTTP::Any in a separate module which should be used from any point of your project.

Why would not make a simple one-line connection? Because of better flexibility and an option to replace the modules used. For example, using LWP::RobotUA instead for LWP::UserAgent.

LWP

 use LWP;
 use HTTP::Any::LWP;
 sub do_http {
        my $ua = LWP::UserAgent->new;
        HTTP::Any::LWP::do_http($ua, @_);
 }

AnyEvent

 use EV;
 use AnyEvent::HTTP;
 use HTTP::Any::AnyEvent;
 sub do_http {
        HTTP::Any::AnyEvent::do_http(\&http_request, @_);
 }

Curl

 use Net::Curl::Easy;
 use HTTP::Any::Curl;
 sub do_http {
        my ($url, $opt, $cb) = @_;
        my $easy = Net::Curl::Easy->new();
        HTTP::Any::Curl::do_http(undef, $easy, $url, $opt, $cb);
 }

Curl with Multi

 use Net::Curl::Easy;
 use Net::Curl::Multi;
 use Net::Curl::Multi::EV;
 use HTTP::Any::Curl;
 my $multi = Net::Curl::Multi->new();
 my $curl_ev = Net::Curl::Multi::EV::curl_ev($multi);
 sub do_http {
        my ($url, $opt, $cb) = @_;
        my $easy = Net::Curl::Easy->new();
        HTTP::Any::Curl::do_http($curl_ev, $easy, $url, $opt, $cb);
 }

CALL

 my $opt = { ... };

 my $cb = sub {
        my ($is_success, $body, $headers, $redirects) = @_;
        ...
 }

 do_http($url, $opt, $cb);

where:

url

URL as string

opt

options and headers

cb

callback function to get result

options

referer

Referer url

agent

User agent name

timeout

Timeout, seconds

gzip

This option adds 'Accept-Encoding' header with gzip value to the HTTP query and tells that the response must be decoded. If you don't want to decode the response, please add 'Accept-Encoding' header into the 'headers' parameter.

headers

Ref on HASH of HTTP headers:

 {
   'Accept' => '*/*',
    ...
 }

It enables cookies support. The "" values enables the session cookies support without saving them. Any other value is transferred as is: ref to a hash (LWP, AnyEvent::HTTP), the file's name (Curl).

persistent

1 or 0. Try to create/reuse a persistent connection. When not specified, see the default behavior of Curl (reverse of CURLOPT_FORBID_REUSE) and AnyEvent::HTTP (persistent)

proxy

http and socks proxy

 proxy => "$host:$port"
 or
 proxy => "$scheme://$host:$port"
 where scheme can be one of the: http, socks (socks5), socks5, socks4.

Install LWP::Protocol::socks to use socks proxy with LWP.

Use AnyEvent::HTTP::Socks instead AnyEvent::HTTP for socks proxy.

max_size

The size limit for response content, bytes.

Note: when you use the accept_encoding and max_size options will be triggered, the current mode is the following: HTTP::Any::Curl - will return the result partially, HTTP::Any::LWP - will return "", HTTP::Any::AnyEvent - will return "".

However, this state can be changed in future.

When max_size options will be triggered, 'client-aborted' header will added with 'max_size' value.

max_redirect

The limit of how many times it will obey redirection responses in a given request cycle.

By default, the value is 7.

body

Data for POST method.

String or CODE ref to return strings (return undef is end of body data).

N.B. CODE ref is not supported for AnyEvent::HTTP (v2.21).

method

When method parameter is "POST", the POST request is used with body parameter on data and 'Content-Type' header is added with 'application/x-www-form-urlencoded' value.

finish callback function

 my $cb = sub {
        my ($is_success, $body, $headers, $redirects) = @_;
        ...
 };

where:

is_success

It is true, when HTTP code is 2XX.

body

HTML body. When on_header callback function is defined, then body is undef.

headers

Ref on HASH of HTTP headers (lowercase) and others info: Status, Reason, URL

redirects

Previous headers from last to first

on_header callback function

When specified, this callback will be called after getting all headers.

 $opt{on_header} = sub {
        my ($is_success, $headers, $redirects) = @_;
        ...
 };

on_body callback function

When specified, this callback will be called on each chunk.

 $opt{on_body} = sub {
        my ($body) = @_; # body chunk
        ...
 };

NOTES

Turn off the persistent options to download pages of many sites.

Use libcurl with "Asynchronous DNS resolution via c-ares".

AUTHOR

Nick Kostyria <kni@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2013 by Nick Kostyria

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.

SEE ALSO

Net::Curl AnyEvent::HTTP LWP

Net::Curl::Multi::EV