NAME

Schedule::AdaptiveThrottler - Throttle just about anything with ease

VERSION

Version 0.06

SYNOPSIS

Limit resource use, according to arbitrary parameters, using a bucket algorithm with counters stored in memcached.

Protect an HTML authentication form

Ban for 5 minutes if more than 5 login attempts for a given username in less than a minute, OR if more than 50 login attempts from a single IP addressin less than 5 minutes.

    use Schedule::AdaptiveThrottler;

    Schedule::AdaptiveThrottler->set_client(Cache::Memcached::Fast->new(...));

    my ( $status, $msg ) = Schedule::AdaptiveThrottler->authorize(
        either => {
            ip    => {
                max     => 50,
                ttl     => 300,
                message => 'ip_blocked',
                value   => $client_ip_address,
            },
            login => {
                max     => 5,
                ttl     => 60,
                message => 'login_blocked',
                value   => $username,
            },
        },
        lockout    => 600,
        identifier => 'user_logon',
    );

    return HTTP_FORBIDDEN if $status == SCHED_ADAPTHROTTLE_BLOCKED;

    ...
Robot throttling

Allow at most 10 connection per second for a robot, but do not ban.

    my ( $status, $msg ) = Schedule::AdaptiveThrottler->authorize(
        all => {
            'ip_ua' => {
                max     => 10,
                ttl     => 1,
                message => 'ip_ua_blocked',
                value   => $client_ip_address .'_'. $user_agent_or_something,
            },
        },
        identifier => 'robot_connect',
    );

    return HTTP_BANDWIDTH_LIMIT_EXCEEDED, '...' if $status == SCHED_ADAPTHROTTLE_BLOCKED;
OO-style
    use Schedule::AdaptiveThrottler;

    my $SAT = Schedule::AdaptiveThrottler->new(
        memcached_client => Cache::Memcached::Fast->new(...));

    my ( $status, $msg ) = $SAT->authorize(...)

EXPLANATION

This module was originally designed to throttle access to web forms, and help prevent brute force attacks and DoS conditions. What it does is very simple: store lists of timestamps, one for each set of parameter defined in the authorize() call, check the number of timestamps in the previously generated list isn't over the threshold set in the call, cleanup the list of expired timestamps from the list, and put the list back in memcached.

It is really a simple bucket algorithm, helped by some of memcached's features (specifically the automatic cleanup of expired records, particularly useful when a ban has been specified).

The interesting thing about it is it can count and throttle anything: if you need to restrict access to a DB layer to a certain number of calls per minute per process, for instance, you can do it the exact same way as in the examples above. Simply use the PID as the 'value' key, and you're set. The possible applications are endless.

It was written to be fast, efficient, and simpler than other throttling modules found on CPAN. All what we found was either too complicated, or not fast enough. Using memcached, a list and a grep on timestamps, where the criteria (an IP address for instance) are part of the object key, proved satisfactory in all respects. In particular, we didn't want something using locks, which introduces a DoS risk all by itself.

CLASS METHODS

These methods can be used as functions as well, since they are in the @EXPORT_OK list.

set_client

Set the memcached instance to be used. Takes a Cache::Memcached or Cache::Memcached::Fast object as first and only parameter. The value is stored in a class variable, so only one call is needed. It could be any other object acting as a Cache::Memcached instance (only get() and set() are needed, really).

authorize

Takes a hash or hashref as argument, along these lines:

    authorize(
        <'either'|'all'> => {
            <arbitrary_parameter_name> => {
                max     => <maximum tries>,
                ttl     => <seconds before a record is wiped>,
                message => '<arbitrary message sent back to caller on "blocked">',
                value   => <arbitrarily defined value for grouping>,
            },
            ...
        },
        [ lockout => <ban duration in seconds, if any>, ]
        identifier => '<disambiguation string for memcached key>',
    )

The returned value is a list. The first element is a constant (see "EXPORTED CONSTANTS") and the second element is an arrayref of all the messages (individually defined in the parameter list for each condition, see above) for which a block/ban was decided by the counter mechanism.

If the conditions hashref is defined in 'all', all conditions have to be met for a block or ban to be issued. If it is defined in 'either', any condition meeting the limits will trigger it.

Since this is meant to be as non-blocking as possible, failure to communicate with the memcached backend will not issue a ban. The return value of the get/set memcached calls could probably benefit from a more clever approach.

new

Use the OO-style instead. A Schedule::AdaptiveThrottler object can be initialized with a memcached object as a single argument, a hashref containing parameters (one of which optionally being memcached_client) or a hash with the same arguments.

EXPORTED CONSTANTS

SCHED_ADAPTHROTTLE_AUTHORIZED
SCHED_ADAPTHROTTLE_BLOCKED

These 2 constants are used to compare with the value of the first member of the array returned by authorize(). They are currently 1 and 0, but that may change and there could be additions in the future. So do not use true/false on the result of authorize(), since it won't tell you what you think it will.

NOTES

The discussion came to a point where we thought it would be more efficient to store timestamp:count:timestamp:count:... However benchmarks showed no difference in performance, only in storage size (and even that only under certain conditions, like many hits in the same second).

CAVEATS

Since there is no locking mechanism, which would introduce a serious DoS risk, it can happen that 2 calls to get() and set() are interleaved, leading to one of the hits to be ignored. It should not be very common though, given the typical time between a get() and a set() plus the memcached round-trip, but it cannot be guaranteed the hits count will always be exact. This should however not be a problem for the typical use cases. However, if you need a precise count, use a different module (and be prepared to try and solve the tricky locking/DoS conditions mentioned above...)

BUGS

Please report any bugs or feature requests to bug-schedule-adaptivethrottler at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Schedule::AdaptiveThrottler. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Schedule::AdaptiveThrottler

You can also look for information at:

ACKNOWLEDGEMENTS

Philippe "BooK" Bruhat
Dennis Kaarsemaker
Kristian Köhntopp
Elizabeth Mattijsen
Ruud Van Tol

This module really is the product of collective thinking.

AUTHOR

David Morel, <david.morel at amakuru.net>

LICENSE AND COPYRIGHT

Copyright 2010 David Morel & Booking.com.

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.