Mail::SpamTest::Bayesian - Perl extension for Bayesian spam-testing
use Mail::SpamTest::Bayesian; my $j=Mail::SpamTest::Bayesian->new(dir => '.'); $j->init_db; $j->merge_mbox_spam($scalar_spam_box); $j->merge_mbox_nonspam($scalar_nonspam_box); $message=$j->markup_message($message);
This module implements the Bayesian spam-testing algorithm described by Paul Graham at:
http://www.paulgraham.com/spam.html
In short: the system is trained by exposure to mailboxes of known spam and non-spam messages. These are (1) MIME-decoded, and non-text parts deleted; (2) tokenised. The database files spam.db and nonspam.db contain lists of tokens and the number of messages in which they have occurred; general.db holds a message count.
This module is in early development; it is functional but basic. It is expected that more mailbox parsing routines will be added, probably using Mail::Box; and that ancillary programs will be supplied for use of the module as a personal mail filter.
Standard constructor. Pass a hash or hashref with parameters.
Useful parameters: dir -> database directory (.) significant -> number of significant tokens to consider (15) threshold -> spam threshold (0.9) fudgefactor -> Non-spam priority (2)
Deletes and re-initialises databases. Call this only once, when you first set up the database.
Train the system by giving it a mailbox full of spam.
Pass a scalar or array or arrayref containing raw messages.
Train the system by giving it a mailbox full of legitimate email.
Pass a stream (pointing to an mbox file) from which to read messages. For example, an IO::File object.
Pass a stream (pointing to an mbox file) from which to read messages.
As merge_mbox_spam, but for a single message; pass in a scalar.
As merge_mbox_nonspam, but for a single message; pass in a scalar.
Test a message for possible spammishness. Pass a scalar containing a single message. Will return the original message with inserted headers:
X-Bayesian-Spam: (YES|NO) (probability%) X-Bayesian-Test: the significant tests and their weights
Pass a scalar containing a single message. Returns a list:
0: spam status (1 for spam, 0 for non spam) 1: probability of spam 2: listref of significant tests
Roger Burton West, <roger@firedrake.org>
Erwin Harte provided useful feedback and the de-MIMEing code.
perl, BerkeleyDB.
To install Mail::SpamTest::Bayesian, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Mail::SpamTest::Bayesian
CPAN shell
perl -MCPAN -e shell install Mail::SpamTest::Bayesian
For more information on module installation, please visit the detailed CPAN module installation guide.