Plack::Middleware::DetectRobots - Automatically set a flag in the environment if a robot client is detected
version 0.03
use Plack::Builder; my $app = sub { ... } # as usual builder { enable 'DetectRobots'; # or: enable 'DetectRobots', env_key => 'psgix.robot_client'; # or: enable 'DetectRobots', extended_check => 1, generic_check => 1; $app; }; # ... and later ... if ( $env->{'robot_client'} ) { # ... do something ... }
This Plack middleware uses the list of robots that is part of the AWStats log analyzer software package to analyse the User-Agent HTTP header and to set an environment flag to either a true or false value depending on the detection of a robot client.
User-Agent
Once activated it checks the User-Agent HTTP header against a basic list of patterns for common bots.
If you activate the appropriate options, it can also use an extended list for the detection of less common bots (cf. extended_check) and / or a list of quite generic patterns to detect unknown bots (cf. generic_check).
extended_check
generic_check
You may also pass in your own regular expression as a string for further checks (cf. <local_regexp>).
The checks are executed in this order:
1. Local regular expression
2. Basic check
3. Extended check
4. Generic check
If a check yields a positive result (i.e.: detects a bot) the remaining checks are skipped.
Depending on the check which detected a bot, the environment flag is set to one of these values: LOCAL, BASIC, EXTENDED, or GENERIC.
LOCAL
BASIC
EXTENDED
GENERIC
If no bot is detected, the flag is set to 0.
0
The default name of the flag in the environment is robot_client, but this can be customized by setting the env_key option when enabling this middleware.
robot_client
env_key
It might make sense to use psgix.robot_client by default instead, but the PSGI spec states that the "'psgix.' prefix is reserved for officially blessed extensions" - which does not apply to this module. You may, however, set the key to psgix.robot_client yourself by using the env_key option mentioned before.
psgix.robot_client
This software is currently considered BETA and still needs to be seriously tested!
Based on Revision 2d289e, 2014-11-20 of http://sourceforge.net/p/awstats/code/ci/develop/tree/wwwroot/cgi-bin/lib/robots.pm.
Note: that list might be somewhat dated, as I did not find bingbot in the list of common bots (only in the extended list) while it's predecessor msnbot was considered common.
You may specify the following option when enabling the middleware:
Set the name of the entry in the environment hash.
basic_check
You may deactivate the standard checks by setting this option to a false value. E.g. if your are only interested in obscure bots or in your local pattern checks.
By setting this option to a false value while simultaneously passing a regular expression to local_regexp one can imitate the behaviour of Plack::Middleware::BotDetector.
local_regexp
Determines if an extended list of less often seen robots is also checked for. By default, only common robots are checked for, because the extended check requires a rather large and complex regular expression. Set this param to a true value to change the default behaviour.
Determines if the User-Agent string is also analysed to determine if it contains certain strings that generically identify the client as a bot, e.g. "spider" or "crawler" By default, this check is not performed, even though it uses only a relatively short and simple regex.. Set this param to a true value to change the default behaviour.
You may optionally pass in your own regular expression (as a Regexp object using qr//) to check for additional patterns in the User-Agent string.
qr//
Plack, Plack::Middleware, Plack::Middleware::BotDetector, http://awstats.org/
The functionality provided by Plack::Middleware::BotDetector is basically the same as that of this module, but it requires you to pass in your own regular expression and does not include a default list of known bots.
Plack::Middleware::BotDetector
Heiko Jansen <hjansen@cpan.org>
This software is copyright (c) 2015 by Heiko Jansen.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
To install Plack::Middleware::DetectRobots, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Plack::Middleware::DetectRobots
CPAN shell
perl -MCPAN -e shell install Plack::Middleware::DetectRobots
For more information on module installation, please visit the detailed CPAN module installation guide.