Email::Fingerprint - Calculate a digest for recognizing duplicate emails
Version 0.46
Email::Fingerprint calculates a checksum that uniquely identifies an email, for use in spotting duplicate messages. The checksum is based on: the Message-ID: header; or if it doesn't exist, on the Date:, From:, To: and Cc: headers together; or if those don't exist, on the body of the message.
use Email::Fingerprint; my $foo = Email::Fingerprint->new(); ...
$fp = new Email::Fingerprint({ input => \*INPUT, # Or $string, \@lines, etc. checksum => "Digest::SHA", # Or "Digest::MD5", etc. strict_checking => 1, # If true, use message bodies %mail_header_opts, });
Create a new fingerprinting object. If the input option is used, Email::Fingerprint attempts to intelligently read the email message given by that option, whether it's a string, an array of lines or a filehandle.
input
Email::Fingerprint
If $opts{checksum} is not supplied, then Email::Fingerprint will use the first checksum module that it finds. If it finds no modules, it will use unpack in a ghastly manner you don't want to think about.
$opts{checksum}
unpack
Any %opts are also passed along to Mail::Header-new>; see the perldoc for Mail::Header options.
%opts
Mail::Header-
Mail::Header
# Uses original/default settings to take checksum $checksum = $fp->checksum; # Can use any options accepted by constructor $options = { input => \*INPUT, # Or $string, \@lines, etc. checksum => "Digest::SHA", # Or "Digest::MD5", etc. strict_checking => 1, # If true, use message bodies %mail_header_opts, }; # Overrides one or more original/default settings $checksum = $fp->checksum($options);
Calculates the actual email fingerprint. The optional hashref argument will permanently override the object's previous settings.
$fingerprint->read_string( $email ); $fingerprint->read_string( $email, \%mh_args );
Accepts the email message $email and attempts to read it intelligently, distinguishing strings, array references and file handles. If supplied, the optional hash reference is passed on to Mail::Header.
$email
$fingerprint->read_string( $email_string ); $fingerprint->read_string( $email_string, \%mh_args );
Accepts the email message $email_string and prepares it for checksum computation. If supplied, the optional hashref is passed on to Mail::Header.
$email_string
$fingerprint->read_filehandle( $email_fh ); $fingerprint->read_filehandle( $email_fh, \%mh_args );
Accepts the email message $email_fh and prepares it for checksum computation. If supplied, the optional hashref is passed on to Mail::Header.
$email_fh
$fingerprint->read_arrayref( \@email_lines ); $fingerprint->read_arrayref( \@email_lines, \%mh_args );
Accepts the email message \@email_lines and prepares it for checksum computation. If supplied, the optional hashref is passed on to Mail::Header.
\@email_lines
Returns true if an email message has been loaded and is ready for checksum, or false if no message has been loaded or an error has occurred.
Specifies the checksum method to be used.
A constructor helper method called from the Class::Std framework. To execute BUILD, use new().
Class::Std
BUILD
new()
Extract the Message-ID: header. If that does not exist, extract the Date:, From:, To: and Cc: headers. If those do not exist, then force strict checking so that the message body will be fingerprinted.
$body = $fp->_extract_body;
Gets the body of the message, as a string. Line-endings are preserved, so the body can, e.g., be printed.
This method must only be called after a message has been read. No validation is done in the method itself, so this is the user's responsibility.
@headers = qw( foo@example.com bar@example.com ); $delim = 'To:'; $string = $fp->_concat( \@headers, $delim ); # $string is now 'To:foo@example.comTo:bar@example.com'
Returns the concatenation of \@headers, with $delim prepended to each element of \@headers. If $delim is omitted, the empty string is used. \@headers elements are all chomped before concatenation.
\@headers
$delim
Len Budney, <lbudney at pobox.com>
<lbudney at pobox.com>
Please report any bugs or feature requests to bug-email-fingerprint at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Email-Fingerprint. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
bug-email-fingerprint at rt.cpan.org
You can find documentation for this module with the perldoc command.
perldoc Email::Fingerprint
You can also look for information at:
AnnoCPAN: Annotated CPAN documentation
http://annocpan.org/dist/Email-Fingerprint
CPAN Ratings
http://cpanratings.perl.org/d/Email-Fingerprint
RT: CPAN's request tracker
http://rt.cpan.org/NoAuth/Bugs.html?Dist=Email-Fingerprint
Search CPAN
http://search.cpan.org/dist/Email-Fingerprint
See Mail::Header for options governing the parsing of email headers.
Email::Fingerprint is based on the eliminate_dups script by Peter Samuel and available at http://www.qmail.org/.
eliminate_dups
Copyright 2006-2011 Len Budney, all rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
To install Email::Fingerprint, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Email::Fingerprint
CPAN shell
perl -MCPAN -e shell install Email::Fingerprint
For more information on module installation, please visit the detailed CPAN module installation guide.