exclude_robot.pl - a simple filter script to filter robots out of logfiles
exclude_robot.pl -url <robot exclusions URL> [ -exclusions_file <exclusions file> ] <httpd log file> OR cat <httpd log file> | exclude_robot.pl -url <robot exclusions URL>
This script filters HTTP log files to exclude entries that correspond to know webbots, spiders, and other undesirables. The script requires a URL as a command line option which should point to a text file containing a linebreak separated list of lowercase strings to match on for bots. This is based on the format used by ABC (http://www.abc.org.uk/exclusionss/exclude.html).
The script filters httpd logfile entries either from a filename specified on the command line, or from STDIN. It outputs filtered entries to STDOUT.
Specify the URL of file to grab which contains the list of agents to exclude. The option is REQUIRED.
Specify a file to save excluded entries from the logfile. This option is OPTIONAL.
Ave Wrigley <Ave.Wrigley@itn.co.uk>
Copyright (c) 2001 Ave Wrigley. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
1 POD Error
The following errors were encountered while parsing the POD:
You forgot a '=back' before '=head1'
To install HTTPD::Log::Filter, copy and paste the appropriate command in to your terminal.
cpanm
cpanm HTTPD::Log::Filter
CPAN shell
perl -MCPAN -e shell install HTTPD::Log::Filter
For more information on module installation, please visit the detailed CPAN module installation guide.