The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.
=head1 LWPng

This note describe the redesign of the LWP perl modules in order to
add full support for the HTTP/1.1 protocol.  The main change is the
adoption of an event driven framework.  This allows us to support
multiple connections within a single client program.  It was also a
prerequisite for supporting HTTP/1.1 features like persistent
connections and pipelining.

=head1 HTTP/1.1

RFC 2068 is the proposed standard for the Hypertext Transfer Protocol
version 1.1, usually denoted HTTP/1.1.  The document is currently
revised by the IETF and a draft standard document is expected soon??

The HTTP/1.1 protocol use the same basic message format as earlier
versions of the protocol and HTTP/1.1 clients/servers can easily adopt
to peers which only know about the old protocol. HTTP/1.1 adds some
new methods, some new status codes, and some new headers.  One
important change is that the Host header is now mandatory.  Other
changes is support for partial content, through the specification of
byte ranges and that the support for caching and proxies has been
improved on.  There is also a standard way of switching from HTTP/1.1
to some other (more suitable) protocol on the wire.

The most important change is the introduction of persistent
connections.  This means that more than one request/response exchange
takes place on a single TCP connection between a client and a server.
This improves performance and generally interacts better with how TCP
works.  This also means that the peers must be able to tell the extent
of the messages on the wire. In HTTP/1.0 the only way to do this was
by using the Content-Length header and by closing the connection
(which was only an option for the server).  Use of the Content-Length
header is not appropriate when the length of the message can not be
determined in advance.  HTTP/1.1 introduce two new ways to delimit
messages; the chunked transfer encoding and self delimiting multipart
content types.  The chunked transfer encoding means that the message
is broken into chunks of arbitrary sizes and that each chunk is
preceded by a line specifying the number of bytes in the chunk.  The
multipart types use a special boundary bytepattern as a delimiter for
the message.

With persistent connections one can improve performance even more by
the use of a technique called "pipelining".  This means that the
client sends multiple requests without waiting for the response of the
first request before sending the second.  This can have a dramatic
effect on the thoughput for links with high round-trip delay.


=head1 Event driven programming model

Let's investigate what impact the event driven framework has on the
programming model.  The basic model for sending requests and receving
respones used to be:

  $res = $ua->request($req);   # return when response is available
  if ($res->is_success) {
      #...
  }

With the new event driven framework it becomes:

  $ua->spool($req1);   # returns immediately
  $ua->spool($req2);   # can send multiple request in parallel
  #...

  mainloop->run;       # return when all connections are gone

Request objects are created and then handed off to the $ua which will
queue them up for processing.  As you can see, there is no longer any
natural place to test the outcome of the requests.  What happen is
that the requests live their own lives and they will be notified
(though a method call) when the corresponding response is available.
You will have to set up event handlers (in the requests) that react to
these events.

Luckily, this does not mean that all old programs must be rewritten.
The following show one way to emulate something very close to the old
behaviour:

  my $res;
  my $req = LWP::Request->new(GET => $url);
  $req->{'done_cb'} = sub { $res = shift; }

  $ua->spool($req);
  mainloop->one_event until $res;

  if ($res->is_success) {
      #...
  }

and this will in fact be used to emulate the old $ua->request() and
$ua->simple_request() interfaces.  The goal is to be able to
completely backwards compatible with the current LWP modules.

=head2 LWP::Request

As you can see from the example above we use the class name
LWP::Request (as opposed to HTTP::Request) for the requests created.
LWP::Request is a subclass of HTTP::Request, thus it have all the same
methods, attributes as HTTP::Request and then some more.  The most
important of these are two callback methods that will be invoked as
the response is received:

   $req->response_data($data, $res);
   $req->done($res);

The response_data() callback method is invoked repeatedly as parts of
the content of the response becomes available.  The first time it is
invoked, then $res will be a reference to a HTTP::Response object with
response code and headers initialized, and empty content.  The default
implementation of response_data just appends the data passed to the
content of the $res object.  It also supports a registered callback
function ('data_cb') that can be invoked.

The done() callback method is invoked when the whole response has been
received.  It is guaranteed that it will be invoked once for each
request spooled (even if it fails.)  The default implementation will
set up the $res->request and $res->previous links and will
automatically handle redirects and unauthorized responses by
respooling a slightly modified copy of the original requests.  It also
supports a registered callback function ('done_cb') that will invoked,
but only for the last response in case of redirect chains.

As an application programmer you can either subclass LWP::Request, to
provide your own versions of response_data() and done(), or you can
just register callback functions.

The LWP::Request object also provide a few more attributes that might
be of interest.  The $req->priority is a number between 1 and 100 that
can be used to select which request goes first when multiple are
spooled at the same time.  Requests will the least numbers go first.

The $req->proxy attribute tells us if we are going to pass the request
to an proxy server instead of the server implied by the URL.  If
$req->proxy is TRUE, then it should be the URL of the proxy.


=head2 LWP::MainLoop

The event oriented framework is based on a single common object
provided by the LWP::MainLoop module that will watch external file
descriptors (sockets) and timers.  When events occur, then registered
functions are called and these call other event handling functions and
so on.

In order for this to work, the mainloop object needs to be in control
when nothing else happens and you expect protocol handling to take
place.  This is achieved by repeatedly calling the mainloop->one_event
method until we are satisfied.  Each call will wait until the next
event is available, then invoke the corresponding callback function
and then return.  The one_event() interface is handy because it can be
applied recursively and you can set up event loops in event handlers
invoked by outer event loops.

The call mainloop->run is a shorthand for a common form of this loop.
It will call mainloop->one_event until there is no registered sockets
and no timers left.

The following program shows how you can register your own callbacks.
For instance the application might want to be able to read commands
from the terminal.

  use LWP::MainLoop qw(mainloop);

  mainloop->readable(\*STDIN, \&read_and_do_cmd);
  mainloop->run;

  sub read_and_do_cmd
  {
     my $cmd;
     my $n = sysread(STDIN, $cmd, 512);
     chomp($cmd);

     if ($cmd eq "q") {
         exit;
     } elsif ($cmd =~ /^(get|head|trace)\s+(\S+)/i) {
         $ua->spool(LWP::Request->new(uc($1) => $2));
     } ...

  }

Currently LWPng use its own private event loop implementation.  The
plan is to adopt the event loop implementation used by the Tk
extention.  This should allow happies mixing of Tk and LWPng.