The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Message::Style - Perl module to perform stylistic analysis of messages

SYNOPSIS

  use Message::Style;

  my $score=Message::Style::score(\@article);
  # or
  my $score=Message::Style::score(@article);

DESCRIPTION

This Perl library does an analysis of a RFC2822 format message (typically email messages or Usenet posts) and produces a score that, in the author's opinion, gives a good indication as to whether the poster is a fsckwit, and therefore whether their message should be ignored.

SCORING MECHANISM

This script takes a Usenet article (or other RFC822 formatted text) and attempts to identify whether the sender is a fsckwit. It does this by analysing quoting style, line length, spelling, and various other criteria.

There are several things that are annoying about Usenet posts, the scores are related to the "cost" of these. There are Byte Points (bandwidth wasted in transmission of pointless material) and Line Points (time wasted scrolling through pointless material). These, and their justifications are:

  1. Article has excessively long lines.

    Long lines are wrapped by some newsreaders, truncated by others, or a horizontal scrollbar is presented. Whatever the case, these cause extra effort for the reader to scroll. A Line Point is given for every block of 80 chars (or part) beyond char 80.

  2. Article is not completely in plain text.

    Non-plain Content-Type, e.g. text/html, or a non-text Content-Encoding is unreadable to many. Byte Points are given for the entire article.

  3. Article has a very large signature.

    Signatures are generally a waste of bandwidth, and long ones need to be paged through. It is considered bad form to have a signature larger than the McQuary limit of 80x4. Because of that, Byte Points and Line Points scored for every character and line outside the 80x4 box.

  4. Article contains a Big Ugly ASCII Graphic (BUAG)

    BUAGs are those annoying graphics that always seem to come with "cute" extralong signatures. These are warned of, but not scored since they've already been accounted for in 3 (and also because BUAGs in the body of the message are sometimes useful.)

  5. Article has incorrectly-formatted quoted material.

    A quote is expected to precede the original material. Scoring is based upon this. The first four lines of the quoted material doesn't score at all. The original material is then counted for lines and bytes, and half of each is also allowed for quoted material. Beyond that, Byte and Line scores are applied. Top-posted articles are expected to score badly from this heuristic.

In addition, Byte and Line scores are multipled by the number of newsgroups crossposted to.

For final scoring, a Line point equals 40 Byte points.

FUNCTIONS

score
  my $score=Message::Style::score(@article);

Performs a scoring operation on the article, and returns the score.

WARNINGS

This module is basically the result of ripping out the core of a really nasty script I wrote early in my Perl career and wrapping the minimum around it to pass CPAN muster. So the code is a bit crufty, although it does certainly work and has heard of strict and warn.

It was however reasonably well-tested at the time thanks to plenty of fsckwit source material on birmingham.misc / uk.local.birmingham.

SEE ALSO

AUTHOR

All code and documentation by Peter Corlett <abuse@cabal.org.uk>.

COPYRIGHT

Copyright (C) 2000-2004 Peter Corlett <abuse@cabal.org.uk>. All rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SUPPORT / WARRANTY

This is free software. IT COMES WITHOUT WARRANTY OF ANY KIND.