News::GnusFilter - package for scoring usenet posts
Version: 0.55 ($Revision: 1.6 $)
# ~/.gnusfilter - scoring script
require 5.006; use strict; use News::GnusFilter qw/:tests groan references NSLOOKUP VERBOSE/; NSLOOKUP = ""; # disables nslookups for bogus_address test VERBOSE = 1; # noisier output for debugging my $goof = News::GnusFilter->set_score( { rethreaded => 80, no_context => 60, } );
# standard tests - see MESSAGE TESTS for details
missing_headers; bogus_address; annoying_subject; cross_post; mimes; lines_too_long; control_characters; miswrapped; misattribution; jeopardy_quoted; check_quotes; # runs multiple tests on quoted paragraphs bad_signature;
# custom tests - see WRITING HEADERS and SCORING
if (check_quotes and not references) { $goof->{rethreaded} = groan "Callously rethreaded"; } if (references and not check_quotes) { $goof->{no_context} = groan "Missing context"; }
__END__
Your GnusFilter script should be installed as a mime-decoder hook for gnus.
News::GnusFilter is a pure-Perl package for scripting an inline message filter. It adds "Gnus-Warning:" headers when presented with evidence of atypical content or otherwise nonstandard formatting for usenet messages.
News::GnusFilter should be drop-in compatible with other newsreaders that are capable of filtering a usenet posting through an external application prior to display. See the CONFIGURATION section below for descriptions of tunable parameters, and the MESSAGE TESTS section for descriptions of the exported subroutines.
The strange yet powerful correlation between usenet cluelessness and bunk-peddling is best summarised in the following quote:
"Opinions may of course differ on this topic, but wouldn't it be better to persuade the hon. Usenaut, as a first priority, to post accurate information, before persuading them to abandon this remarkably accurate indicator of usenet bogosity?"
-- Alan Flavell in comp.lang.perl.misc
(add-hook 'gnus-article-decode-hook '(lambda () (gnus-article-decode-charset) (let ((coding-system-for-read last-coding-system-used) (coding-system-for-write last-coding-system-used)) (call-process-region (point-min) (point-max) "/path/to/gnusfilter" t (current-buffer)) )))
The recommended installation path for your script is ~/.gnusfilter.
These are the export lists for News::GnusFilter. See the Export manpage for more details.
my %parameters = ( HEADER => "Gnus-Warning", # header added NSLOOKUP => "nslookup", # '' avoids DNS lookups PASSTHRU_BYTES => 8192, # filter disabled LINE_LEN => 80, # columns EGO => 10, # self-ref's in new text TOLERANCE => 50, # % quoted text MAX_CONTROL => 5, # control chars MIN_LINES => 20, # short posts are OK SIG_LINES => 4, # acceptable sig lines NEWSGROUPS => 2, # spam cutoff FBI => 100, # tolerable bogosity level VERBOSE => 0, # toggles debugging output ); @EXPORT_OK = keys %parameters; %EXPORT_TAGS = ( params => \@EXPORT_OK, tests => [ qw/ missing_headers bogus_address annoying_subject cross_post lines_too_long control_characters miswrapped check_quotes jeopardy_quoted misattribution bad_signature mimes / ], ); @EXPORT = ( @{$EXPORT_TAGS{tests}}, qw/ groan groanf lines references newsgroups head body paragraphs sig / );
By default, GnusFilter exports all the standard :tests. It also provides access to the message itself via the head(), body(), lines(), paragraphs(), and sig() functions. See WRITING HEADERS and SCORING for details on groan() and groanf().
:tests
head()
body()
lines()
paragraphs()
sig()
groan()
groanf()
If you need to tune some of the parameters, they are not exported by default, so you can import them either by name or all at once with the :params tag:
:params
use News::GnusFilter qw/ :tests :params /; FBI = 200; # raise tolerable bogosity level to 200 VERBOSE = 1; # enable debugging output HEADER = "X-Filter"; ...
The parameters are exported as lvalued subs, and is the only place where this module uses special features of perl 5.6+.
groan() and groanf() are the analogs of print and printf, and are exported by default. The value of the warning header may be changed globally via HEADER:
HEADER="X-Format-Warning"; # overrides default "Gnus-Warning" groan "mycheck failed" unless mycheck(body);
These settings are modifiable through the set_score sub. See the description in Scoring API below for details.
set_score
# scoring parameters
my %goof; # counts occurrence of each error type my %weight = # error type => default score ( # typical range of %goof value: totalquote => 100, # jeopardy_quoted => 80, # boolean (0-1) misattribution => 60, # lines_too_long => 50, # missing_headers => 50, # 0-2 mime_crap => 40, # 0-3? : annoying_subject => 40, # ~0-4 cross_post => 30, # 0,~2-4 bogus_address => 30, # 0-3 : 822, dns miswrapped => 30, # ~0-5 : lines (up to 5) control_chars => 20, # 0-5 : up to 5 chars ego => 5, # 0,~10-20 : I me my count overquoted => 2, # 0-50 : percentage over TOLERANCE bad_signature => 2, # 0,5-20 : lines code => -5, # 0,~10-30 );
# set_score - scripter's interface to %goof and %weight
sub set_score { my $href = pop @_; # override weight table @weight{ keys %$href } = values %$href if ref $href; return bless \%goof; }
# score - returns Flavell Bogosity Index
sub score { my $score = 0; $score += $goof{$_} * $weight{$_} for grep {exists $weight{$_}} keys %goof; return $score; }
set_score() provides access to the %goof and %weight hashes, which form the basis of the Flavell Bogosity Index calculator score(). The SYNOPSIS contains a sample usage.
set_score()
%goof
%weight
score()
score() calculates the current bogosity index based on the rules applied so far. Neither set_score nor score are importable, so script writers should use OO-like syntax or their package-qualified names.
score
Note: GnusFilter is not an OO package- although set_score() returns a blessed reference to %goof, the final automatic score() calculation is not OO. However, if necessary it can be disabled by setting FBI = 0 in your script.
FBI = 0
use News::GnusFilter qw/:tests FBI/; FBI = 0;
These are the exported functions that form the basis of a GnusFilter script. These functions are memoized to avoid repeat warnings and overscoring.
Checks for proper attribution in quoted text.
Warns of newsgroup spamming (level determined by NEWSGROUPS). On an original post, it returns total number of posted groups, on followups it just returns 1.
NEWSGROUPS
Validates the Reply-To: (or From:, if not present) header using rfc822 and a dns lookup on the domain. Setting NSLOOKUP to a false value will disable the dns lookup- otherwise NSLOOKUP should point to the location of your nslookup(8) binary.
NSLOOKUP
Look for control characters in the message body. returns their number (up to MAX_CONTROL).
MAX_CONTROL
Check for oversized lines as set by LINE_LEN. The return value is boolean.
LINE_LEN
Verifies existence of Subject: and References: header as necessary.
Tests for miswrapped lines in quoted and regular text. Returns number of occurrences, which may be excessive for things like posted logfiles.
Tests for upside-down posting style (newsgroup replies should follow quoted text, not vice-versa). return value is boolean.
Overtaxed sub that checks for overquoted messages. Also looks for over-opinionated text (too many I's) and lots of code (oft considered a good thing :). In scalar context, it returns the total number of quoted lines. Resulting warnings are subject to VERBOSE, MIN_LINES, EGO, and TOLERANCE settings.
VERBOSE
MIN_LINES
EGO
TOLERANCE
Checks for standard signature block. If the lines exceed SIG_LINES, it returns the number of lines in signature (up to 20). Otherwise returns 0.
SIG_LINES
+10 is added to the return value for nonstandard sig sep's.
Looks for the attribution text preceding the quoted text and returns it.
Complains if the subject contains useless words in it. Returns the number of faux pas if this is an original post, otherwise returns a false value for followups.
my @patterns = ( qr/ ( [?!]{3,} ) /x, qr/ ( HELP ) /x, qr/ ( PLEASE ) /x, qr/ (NEWB[IE]{2})/xi, qr/ ( GURU ) /xi, );
Warns if the message is MIME-encoded.
Terribly slow on large messages.
Etiquette rules may need adjusting for normal e-mail.
Does not (currently) look for quoted sigs
manually wrapped logfiles are heavily penalized
some context sensitive stuff (original, request, newsgroup, mail) is wrong
Return values, default settings, and especially regexps are subject to change. Please send bug reports and patches to the author.
Joe Schaefer <joe+cpan@sunstarsys.com>. This package borrows heavily from Tom Christiansen's msgchk script.
Copyright 2001 Joe Schaefer. This code is free software; it is freely modifiable and redistributable under the same terms as Perl itself.
To install News::GnusFilter, copy and paste the appropriate command in to your terminal.
cpanm
cpanm News::GnusFilter
CPAN shell
perl -MCPAN -e shell install News::GnusFilter
For more information on module installation, please visit the detailed CPAN module installation guide.