The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

spamtrap - Manage a spamtrap and produce DNS::BL commands to respond

SYNOPSIS

  spamtrap [-A accept-subnets] [-a archive-dir] [-C complaint-template]
    [-S spam-whitelist] [-s spam-dnsbl-commands] [-D dul-whitelist] 
    [-d dul-dnsbl-commands] [-F] [-f sample-list] [-H policy-string] 
    [-h] [-I path-substitute] [-i index-file] [-k score-file] [-l] 
    [-m match-regexp] [-O own-subnets] [-o own-abuse-address] 
    [-r mail-relay] [-T tag] [-t] [-u] [-v] [-V] [-W whois-cache-dir] 
    [-w whois-server] [-X exclude-subnets] 

DESCRIPTION

This program is meant to be used in procmail recipes serving spamtrap and/or spam reporting addresses. Its main functions include:

Evidence archiving

When instructed to do so, each processed spam sample is stored in a file within a path. The file name is unique and encodes the timestamp of spam processing.

Multiple DNSBL listings per spam sample

When specified, commands can be produced to update two dnsbls. The first dnsbl, referred to as 'spam-dnsbl', will contain entries for /32s that sent spam to our known mail servers.

The second dnsbl, referred to as 'dul-dnsbl', will include entries for /24s where the /32s are located, if said /32 does not pass a set of heuristic tests designed to locate space of a dynamic nature.

Understands various forms of complex complaints

When deployed in an address used by users to report spam, will attempt to find the spam headers within attachments, possibly including decoded uuencoded and base64 parts. Multiple header sets can be analyzed in a single complaint, which will be archived separatedly.

Keeps an index of spam samples

An external index, maintained with MLDBM, Storable and DB_File. This is very useful to quickly locate evidence related to a given IP address.

Flexible whitelisting

Various whitelists can be specified in configuration files. Files are composed of one-line regular expressions. Perl comments and whitespace can be added for documentation purposes.

Score-based blacklisting

Optionally, a score or "spam history" can be kept for the IP space and domains identified with analyzed samples. This history can be used to implement thresholds for blacklisting, adding hosts only after a certain number of spam samples have been collected.

The following options control the behaviour of this script.

-A accept-subnets

accept-subnets is a comma-separated list of subnets, specified in any format that NetAddr::IP will understand. When IP addresses found in matching Received: headers are found, they are rejected if they don't fall within the networks given by this option.

When the option is unspecified, all IP addresses are accepted.

-a archive-dir

Causes the current message to be archived at the supplied directory, which must exist and be writeable. If these conditions are not met, processing is aborted.

The file name will be of the form

  <timestamp>-<hash>

Where <timestamp> is the number of seconds since the epoch and <hash> is the MD5 in hex of the message. Note that this feature requires the Digest::MD5 module.

-C complaint-template

If specified, use the supplied file as a Text::Template for producing an automated abuse complaint. See the supplied example for guidance in writing your own.

-D dul-whitelist

If specified, dul-whitelist is the name of a whitelist file that is applied to the IP addresses and names being considered for dul listing. If a match occurs, the entry is not listed.

-d dul-dnsbl-commands

Test the name associated to each IP address eligible to get in the spam dnsbl. If no name is associated or the name "seems" dynamic, add commands to this list for the /24 that encloses the given IP address.

-h

Output this documentation and terminate the program.

-F

When specified together with -s or -d, cause the without checking clause to be added to the list commands. Note that in certain environments, this might lead to overlapping entries.

-f sample-list

Normally, a single spam sample is read from standard input. When this option is specified, spam samples are assumed to be on files, whose names are stored within a file named sample-list, one on each line. If sample-list is -, then the names of the files to process are read from the standard input.

Note that -a can still be used with -f. You should be careful to delete spam samples already processed and archived according to -a.

-H policy-string

Specify a policy string for use with -k score based blacklisting. A policy string has the form:

    <max-age>,<host-threshold>,<net-threshold>

Where the components have the following semantics:

<max-age>

Maximum age allowed for the scores, which are "forgotten" after this time has elapsed. The suffixes 'd', 'm', 'w' or 'y' can be used to specify the units to mean 'days', 'months', 'weeks' or 'years' respectively.

The default value is one week, meaning that spam older than one week will be forgiven.

<host-threshold>

What score is required to list a specific host. The score is calculated by adding the individual host score, the network score and the domain score. If the result is greater than this threshold, the host will be listed.

This value defaults to 1, which will list the host inmediately.

<net-threshold>

What score is required to list a network. The score of the network is compared to the given threshold.

The default value for this threshold is 3, which causes a /24 network to be listed after receiving the third piece of spam.

Any element can be left unspecified, in which case it will assume the default value.

-I path-substitute

When -i is used to update an index, causes the replacement of the pathname up to the parent of the path component give in the -a option, with path-substitute. This is used to hide the real location in the filesystem where the samples are stored.

-i index-file

Causes the index specified by index-file to be updated with the currently stored spam sample. This is only useful when using the -a option to specify archiving of the messages. See also -I.

This index provides a convenient reference between an IP address, and the spamtrap hit or spam complaints that mention it. These references are updated, regardless of the blacklisting of the address. That is, the reference will be recorded even when the IP address is not eligible for listing due to other criteria such as whitelists or scores.

This feature provides for a simple evidence archive.

-k score-file

When specified, score-file will be used to create a "spam history" index for each analyzed sample. The index works as follows: When an IP address has been identified out of a spam sample...

The /32 that originated the spam receives one point
The /24 enclosing the sending /32, receives one point
The domain name specified in the PTR record (if any), receives one point

The scores are removed if older than the parameters set with the -H option (or the default policy, if this is not set).

-l

Turns on logging via syslog(3) if your system supports it. This is recommended. Logging is done to the unix socket.

-m match-regexp

If provided, match-regexp must be a Perl regular expression that must match the contents of a Received: header before it is processed. This is useful to restrict the matches to those headers actually produced by your servers.

-O own-subnets

own-subnets is a comma-separated list of subnets, specified in any format that NetAddr::IP will understand. When IP addresses found in matching Received: headers are found that correspond to these ranges, they are reported to the abuse address (defaults to abuse but can be changed with the -o option below).

-o own-abuse-address

Email address to forward complaints about abuse from our own networks. Defaults to abuse.

-r mail-relay

When using -C, Email::Send is used to send the email. If -r is not used, then Email::Send::Sendmail is used. Otherwise, Email::Send::SMTP is used, specifying mail-relay as the host to send email to.

-S spam-whitelist

If specified, spam-whitelist is the name of a whitelist file that is applied to the IP addresses and names being considered for spam listing. If a match occurs, the entry is not listed.

Note that the IP address will still be considered for addition to dul, even when matching this rule.

-s spam-dnsbl-commands

For each IP address found in the headers and satifying all the filtering criteria, output DNS::BL commands into the file named spam-dnsbl-commands. The commands will list the /32.

An attempt will be made to lock the file with flock(2) prior to updating it. File contents will not be clobbered.

-T tag

Add the give tag to the text of the DNS::BL commands generated. This is useful to include codes or instructions in a specific listing.

-t

Enter spamtrap mode. In spamtrap mode, the header of the message passed is processed. The rest of the message is considered as body. No attempts are made to find other headers within the body.

By default (ie, without specifying -t) the header of the message passed to this program is ignored. The body is searched for new headers, as if processing spam complaints from users.

-u

Unlink the given spam sample after succesful processing.

-v

Be verbose about progress. Verbose output is sent to STDERR.

-V

Be even more verbose.

-W whois-cache-dir

When complaints must be sent about processed spam samples, WHOIS is used to find out the contacts to notify. This option allows for a cache of WHOIS information to be stored somewhere in the filesystem. The deafult place is /tmp/whois-cache.

The path will be created if non-existant. Old entries must be removed by an external process after their expiration, which eases the interaction with complex scripted environments such as the ones this program is designed to be a part of.

-w whois-server

Specifies the WHOIS server to query for the list of contacts to send complaints to. Defaults to whois.cyberabuse.org

-X exclude-subnets

exclude-subnets is a comma-separated list of subnets, specified in any format that NetAddr::IP will understand. When IP addresses found in matching Received: headers are found, they are rejected if they fall within the networks given by this option.

When the option is unspecified, all IP addresses are accepted.

EXAMPLES

HISTORY

$Log: spamtrap,v $ Revision 1.30 2004/12/24 11:45:45 lem Corrected typo on sending abuse complaints. ($opt_t -> $opt_o)

Revision 1.29 2004/12/21 20:46:50 lem Small fix when sending complaints to our own abuse address

Revision 1.28 2004/12/18 12:15:51 lem -m works now witf -f Note that the reading of the sample files is now quite slower

Revision 1.27 2004/12/18 11:58:33 lem Return values are now correct, even when -f is used Fixed syslog() messages

Revision 1.26 2004/12/18 00:56:47 lem Make it less prone to die(). Replaced with warn()s to keep it running longer. This might cause data loss in some chronic cases, but is important to improve the flow.

Revision 1.25 2004/12/17 22:20:09 lem Added a warning when the unlink fails. Also, added a missing chomp() to remove newlines in the filenames.

Revision 1.24 2004/12/17 22:00:44 lem Added -u

Revision 1.23 2004/12/17 21:52:25 lem Added a From: address to abuse complaints. Set to the same value of -o

Revision 1.22 2004/12/16 23:23:22 lem Added the capability to note abuse complaints automatically.

Revision 1.21 2004/12/16 22:51:11 lem Added our own do_whois function to better handle the caching of WHOIS information.

Revision 1.20 2004/12/16 21:37:26 lem Added -W/-w to perform WHOIS queries to locate the source of an abuse (thanks to Luis Moreno for part of the code) TODO: Better caching of the results. Added Text::Template based complaint composition. Added -V

Revision 1.19 2004/12/16 16:21:48 lem Added -V and fixed leaking MIME temporary files.

Revision 1.18 2004/12/16 15:41:58 lem Added -f for processing of multiple spam samples in one single run.

Revision 1.17 2004/12/04 02:09:24 lem -I was non-functional

Revision 1.16 2004/11/10 20:58:36 lem Added ability to parse HTML

Revision 1.15 2004/11/09 13:27:42 lem Remove the .txt extension

Revision 1.14 2004/11/08 22:42:58 lem Added linear decay of existing scores

Revision 1.13 2004/11/03 23:36:45 lem Improved DUL messages

Revision 1.12 2004/11/03 23:34:52 lem Improved DUL messages

Revision 1.11 2004/11/03 23:27:05 lem Corrected HTML-entities in POD documentation Verbose print is now sent to STDERR

Revision 1.10 2004/11/03 19:07:43 lem Added an exit value based in the matching of IP addresses

Revision 1.9 2004/10/29 19:05:52 lem Added -A to check for addresses within a set of subnets.

Revision 1.8 2004/10/28 21:08:39 lem Fixed minor bug with __DIE__ and syslog

Revision 1.7 2004/10/28 21:00:35 lem Added basic comment at the beginning. Actual test starts.

Revision 1.6 2004/10/28 20:58:25 lem Added EXAMPLES section. Added -i and -I for producing useable indexes

Revision 1.5 2004/10/27 23:56:24 lem Added score keeping and policy, dynamic heuristics, whitelisting and various other checks. Defined -l for logging. -i is still missing. We need some backward compatibility here.

Revision 1.4 2004/10/27 02:51:53 lem Added -F. Implemented -s and -d. Stubs for whitelisting and dynamic heuristics are in place. -a now works.

Revision 1.3 2004/10/27 00:47:27 lem Removed -p

Revision 1.2 2004/10/27 00:45:33 lem Minor updates to the parser. All reg-tests verified.

Revision 1.1 2004/10/25 05:00:31 lem Interim version of spamtrap. Under development

LICENSE AND WARRANTY

This code and all accompanying software comes with NO WARRANTY. You use it at your own risk.

This code and all accompanying software can be used freely under the same terms as Perl itself.

AUTHOR

Luis E. Muñoz <luismunoz@cpan.org>

SEE ALSO

perl(1), procmail(1), MLDBM(3), Storable(3), DB_File(3), NetAddr::IP(3), Digest::MD5(3), DNS::BL(3), Sys::Syslog(3), Text::Template(3).

2 POD Errors

The following errors were encountered while parsing the POD:

Around line 1704:

Non-ASCII character seen before =encoding in 'Muñoz'. Assuming CP1252

Around line 1712:

'=end' without a target?