The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Feersum - A scary-fast HTTP engine for Perl based on EV/libev

SYNOPSIS

    use Feersum;
    my $ngn = Feersum->endjinn; # singleton
    $ngn->use_socket($io_socket);
    
    # register a PSGI handler
    $ngn->psgi_request_handler(sub {
        my $env = shift;
        return [200,
            ['Content-Type'=>'text/plain'],
            ["You win one cryptosphere!\n"]];
    });
    
    # register a Feersum handler:
    $ngn->request_handler(sub {
        my $req = shift;
        my $t; $t = EV::timer 2, 0, sub {
            $req->send_response(
                200,
                ['Content-Type' => 'text/plain'],
                \"You win one cryptosphere!\n"
            );
            undef $t;
        };
    });

DESCRIPTION

Feersum is an HTTP server built on EV. It fully supports the PSGI 1.03 spec including the psgi.streaming interface and is compatible with Plack. Feersum also has its own "native" interface which is similar in a lot of ways to PSGI, but is not compatible with PSGI or PSGI middleware.

Feersum uses a single-threaded, event-based programming architecture to scale and can handle many concurrent connections efficiently in both CPU and RAM. It skips doing a lot of sanity checking with the assumption that a "front-end" HTTP/HTTPS server is placed between it and the Internet.

How It Works

All of the request-parsing and I/O marshalling is done using C or XS code. HTTP parsing is done by picohttpparser, which is the core of HTTP::Parser::XS. The network I/O is done via the libev library. This is made possible by EV::MakeMaker, which allows extension writers to link against the same libev that EV is using. This means that one can write an evented app using EV or AnyEvent from Perl that completely co-operates with the server's event loop.

Since the Perl "app" (handler) is executed in the same thread as the event loop, one need to be careful to not block this thread. Standard techniques include using AnyEvent or EV idle and timer watchers, using Coro to multitask, and using sub-processes to do heavy lifting (e.g. AnyEvent::Worker and AnyEvent::DBI).

Feersum also attempts to do as little copying of data as possible. Feersum uses the low-level writev system call to avoid having to copy data into a buffer. For response data, references to scalars are kept in order to avoid copying the string values (once the data is written to the socket, the reference is dropped and the data is garbage collected).

A trivial hello-world handler can process in excess of 5000 requests per second on a 4-core Intel(R) Xeon(R) E5335 @ 2.00GHz using TCPv4 on the loopback interface, OS Ubuntu 6.06LTS, Perl 5.8.7. Your mileage will likely vary.

INTERFACE

There are two handler interfaces for Feersum: The PSGI handler interface and the "Feersum-native" handler interface. The PSGI handler interface is fully PSGI 1.03 compatible and supports psgi.streaming. The Feersum-native handler interface is "inspired by" PSGI, but does some things differently for speed.

Feersum will use "Transfer-Encoding: chunked" for HTTP/1.1 clients and "Connection: close" streaming as a fallback. Technically "Connection: close" streaming isn't part of the HTTP/1.0 or 1.1 spec, but many browsers and agents support it anyway.

Currently POST/PUT does not stream input, but read() can be called on psgi.input to get the body (which has been buffered up before the request callback is called and therefore will never block). Likely read() will change to raise EAGAIN responses and allow for a callback to be registered on the arrival of more data. (The psgix.input.buffered env var is set to reflect this).

PSGI interface

Feersum fully supports the PSGI 1.03 spec including psgi.streaming.

See also Plack::Handler::Feersum, which provides a way to use Feersum with plackup and Plack::Runner.

Call psgi_request_handler($app) to register $app as a PSGI handler.

    my $app = do $filename;
    Feersum->endjinn->psgi_request_handler($app);

The env hash passed in will always have the following keys in addition to dynamic ones:

    psgi.version      => [1,0],
    psgi.nonblocking  => 1,
    psgi.multithread  => '', # i.e. false
    psgi.multiprocess => '',
    psgi.streaming    => 1,
    psgi.errors       => \*STDERR,
    SCRIPT_NAME       => "",
    # see below for info on these extensions:
    psgix.input.buffered   => 1,
    psgix.output.buffered  => 1,
    psgix.body.scalar_refs => 1,

Note that SCRIPT_NAME is always blank (but defined). PATH_INFO will contain the path part of the requested URI.

For requests with a body (e.g. POST) psgi.input will contain a valid file-handle. Feersum currently passes undef for psgi.input when there is no body to avoid unnecessary work.

    my $r = delete $env->{'psgi.input'};
    $r->read($body, $env->{CONTENT_LENGTH});
    # optional: choose to stop receiving further input:
    # $r->close();

The psgi.streaming interface is fully supported, including the writer-object poll_cb callback feature defined in PSGI 1.03. Feersum calls the poll_cb callback after all data has been flushed out and the socket is write-ready. The data is buffered until the callback returns at which point it will be immediately flushed to the socket.

    my $app = sub {
        my $env = shift;
        return sub {
            my $starter = shift;
            my $w = $starter->([
                200, ['Content-Type' => 'application/json']
            ]);
            my $n = 0;
            $w->poll_cb(sub {
                $_[0]->write(get_next_chunk());
                $_[0]->close if ($n++ >= 100);
            });
        };
    };

PSGI extensions

Scalar refs in the response body are supported, and is indicated as an via the psgix.body.scalar_refs env variable. Passing by reference is significantly faster than copying a value onto the return stack or into an array. It's also very useful when broadcasting a message to many connected clients.

Calls to $w->write() will never block. This behaviour is indicated by psgix.output.buffered in the PSGI env hash.

psgix.input.buffered is also set, which means that calls to read on the input handle will also never block. Feersum currently buffers the entire input before calling the callback.

This input behaviour will probably change to not be completely buffered. Users of Feersum should expect that when no data is available read, the calls to get data from the input filehandle will return an empty-string and set $! to EAGAIN). Feersum may also allow for registering a poll_cb() handler that works similarly to the method on the "writer" object, although that isn't currently part of the PSGI 1.03 spec. The callback will be called once data has been buffered.

The Feersum-native interface

The Feersum-native interface is inspired by PSGI, but is inherently incompatible with it. Apps written against this API will not work as a PSGI app.

This interface may have removals and is not stable until Feersum reaches version 1.0, at which point the interface API will become stable and will only change for bug fixes or new additions. The "stable" and will retain backwards compatibility until at least the next major release.

The main entry point is a sub-ref passed to request_handler. This sub is passed a reference to an object that represents an HTTP connection. Currently the request_handler is called during the "check" and "idle" phases of the EV event loop. The handler is always called after request headers have been read. Currently, the handler will only be called after a full request entity has been received for POST/PUT/etc.

The simplest way to send a response is to use send_response:

    my $req = shift;
    $req->send_response(200, \@headers, ["body ", \"parts"]);

Or, if the app has everything packed into a single scalar already, just pass it in by reference.

    my $req = shift;
    $req->send_response(200, \@headers, \"whole body");

Both of the above will generate Content-Length header (replacing any that were pre-defined in @headers).

An environment hash is easy to obtain, but is a method call instead of a parameter to the callback. (In PSGI, there is no $req object; the env hash is the first parameter to the callback). The hash contains the same items as it would for a PSGI handler (see above for those).

    my $req = shift;
    my $env = $req->env();

To read input from a POST/PUT, use the psgi.input item of the env hash.

    if ($req->{REQUEST_METHOD} eq 'POST') {
        my $body = '';
        my $r = delete $env->{'psgi.input'};
        $r->read($body, $env->{CONTENT_LENGTH});
        # optional: choose to stop receiving further input:
        # $r->close();
    }

Starting a response in stream mode enables the write() method (which really acts more like a buffered 'print'). Calls to write() will never block.

    my $req = shift;
    my $w = $req->start_streaming(200, \@headers);
    $w->write(\"this is a reference to some shared chunk\n");
    $w->write("regular scalars are OK too\n");
    $w->close(); # close off the stream

The writer object supports poll_cb as also specified in PSGI 1.03. Feersum will call the callback only when all data has been flushed out at the socket level. Use close() or unset the handler ($w->poll_cb(undef)) to stop the callback from getting called.

    my $req = shift;
    my $w = $req->start_streaming(
        "200 OK", ['Content-Type' => 'application/json']);
    my $n = 0;
    $w->poll_cb(sub {
        # $_[0] is a copy of $w so a closure doesn't need to be made
        $_[0]->write(get_next_chunk());
        $_[0]->close if ($n++ >= 100);
    });

METHODS

These are methods on the global Feersum singleton.

use_socket($sock)

Use the file-descriptor attached to a listen-socket to accept connections.

TLS sockets are NOT supported nor are they detected. Feersum needs to use the socket at a low level and will ignore any encryption that has been established (data is always sent in the clear). The intented use of Feersum is over localhost-only sockets.

A reference to $sock is kept as Feersum->endjinn->{socket}.

accept_on_fd($fileno)

Use the specified fileno to accept connections. May be used as an alternative to use_socket.

request_handler(sub { my $req = shift; ... })

Sets the global request handler. Any previous handler is replaced.

The handler callback is passed a Feersum::Connection object.

Subject to change: if the request has an entity body then the handler will be called only after receiving the body in its entirety. The headers *must* specify a Content-Length of the body otherwise the request will be rejected. The maximum size is hard coded to 2147483647 bytes (this may be considered a bug).

read_timeout()
read_timeout($duration)

Get or set the global read timeout.

Feersum will wait about this long to receive all headers of a request (within the tollerances provided by libev). If an entity body is part of the request (e.g. POST or PUT) it will wait this long between successful read() system calls.

graceful_shutdown(sub { .... })

Causes Feersum to initiate a graceful shutdown of all outstanding connections. No new connections will be accepted. The reference to the socket provided in use_socket() is kept.

The sub parameter is a completion callback. It will be called when all connections have been flushed and closed. This allows one to do something like this:

    my $cv = AE::cv;
    my $death = AE::timer 2.5, 0, sub {
        fail "SHUTDOWN TOOK TOO LONG";
        exit 1;
    };
    Feersum->endjinn->graceful_shutdown(sub {
        pass "all gracefully shut down, supposedly";
        undef $death;
        $cv->send;
    });
    $cv->recv;
DIED

Not really a method so much as a static function. Works similar to EV's/AnyEvent's error handler.

To install a handler:

    no strict 'refs';
    *{'Feersum::DIED'} = sub { warn "nuts $_[0]" };

Will get called for any errors that happen before the request handler callback is called, when the request handler callback throws an exception and potentially for other not-in-a-request-context errors.

It will not get called for read timeouts that occur while waiting for a complete header (and also, until Feersum supports otherwise, time-outs while waiting for a request entity body).

Any exceptions thrown in the handler will generate a warning and not propagated.

LIMITS

listening sockets

1 - this may be considered a bug

body length

2147483647 - about 2GiB.

request headers

64

request header name length

128 bytes

bytes read per read() system call

4096 bytes

BUGS

Please report bugs using http://github.com/stash/Feersum/issues/

Keep-alive is ignored completely.

Currently there's no way to limit the request entity length of a POST/PUT/etc. This could lead to a DoS attack on a Feersum server. Suggested remedy is to only run Feersum behind some other web server and to use that to limit the entity size.

Something isn't getting set right with the TCP socket options and the last chunk in a streamed response is sometimes lost. This happens more frequently under high concurrency. Fiddling with TCP_NODELAY and SO_LINGER don't seem to help. Maybe threads are needed to do blocking close() and shutdown() calls?

SEE ALSO

http://en.wikipedia.org/wiki/Feersum_Endjinn

Feersum Git: http://github.com/stash/Feersum git://github.com/stash/Feersum.git

picohttpparser Git: http://github.com/kazuho/picohttpparser git://github.com/kazuho/picohttpparser.git

AUTHOR

Jeremy Stashewsky, stash@cpan.org

THANKS

Tatsuhiko Miyagawa for PSGI and Plack.

Marc Lehmann for EV and AnyEvent (not to mention JSON::XS and Coro).

Kazuho Oku for picohttpparser.

lukec, konobi, socialtexters and van.pm for initial feedback and ideas.

Audrey Tang and Graham Termarsch for XS advice.

confound for docs input.

COPYRIGHT AND LICENSE

Copyright (C) 2010 by Jeremy Stashewsky & Socialtext Inc.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.7 or, at your option, any later version of Perl 5 you may have available.

picohttpparser is Copyright 2009 Kazuho Oku. It is released under the same terms as Perl itself.