The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

NexTrieve::Mbox - convert Unix mailbox to NexTrieve Document sequence

SYNOPSIS

 use NexTrieve;
 $ntv = NexTrieve->new( | {method => value} );

 $converter = $ntv->Mbox( | {method => value} );

 $docseq = $converter->Docseq( $ntv->Index( $resource )->Docseq,<*.mbox> );
 $docseq->done;

DESCRIPTION

The Mbox object of the Perl support for NexTrieve. Do not create directly, but through the Mbox method of the NexTrieve object;

The NexTrieve::Mbox module is basically a wrapper around the NexTrieve::RFC822 object. For more information about handling messages, please check the documentation of the NexTrieve::RFC822 module.

The "mailbox2ntvml" script is basically a directly configurable and executable wrapper for the NexTrieve::Mbox module.

CONVERSION PROCESS

The conversion process of the NexTrieve::Mbox module basically creates a NexTrieve::RFC822 object inside of itself, that is used to describe the format of the NexTrieve::Document XML that should be generated from each message in a mailbox.

Before commencing with indexing, three attributes are added to the NexTrieve::RFC822 object. They are:

- mailbox string key-duplicate 1

Either the name of the Unix mailboxfile, or the string specified with the conceptualmailbox method.

- offset number notkey 1

The offset of the message in the (conceptual) mailbox. Each time a message is finished processing, its length is added to the internally kept offset value.

- length number notkey 1

The length of the message in the (real) mailbox.

If you are not using the conceptualmailbox feature, then the combination of the mailbox, offset and length attributes (as e.g. returned as attributes in a hit of a hitlist) can be directly applied to obtain a copy of the original message.

If the conceptualmailbox feature is used, you are a little bit more on your own: you, as a developer, knows how the conceptualmailbox string maps to a real file or database entry.

The start of a new message in a mailbox is indicated by the string "From " at the beginning of a line. An attempt is made to even handle broken mailboxes, that do not contain complete messages and/or attachments. Depending on the brokenness of the mailbox, none to all messages might actually be ignored in the conversion process.

OBJECT METHODS

The following methods return objects.

Docseq

 $docseq = $converter->Docseq( @mbox );
 $docseq->write_file( filename );

 $index = $ntv->Index( $resource );
 $converter->Docseq( $index->Docseq,@mbox );

The Docseq method allows you to create a NexTrieve document sequence object (or NexTrieve::Docseq object) out of the messages in one or more Unix mailboxes. This can either be used to be directly indexed by NexTrieve (through the NexTrieve::Index object) or to create the XML of the document sequence in a file for indexing at a later stage.

The first (optional) input parameter is an (already existing) NexTrieve::Docseq object that should be used. This can either be a special purpose NexTrieve::Docseq object as created by the NexTrieve::Index module, or a NexTrieve::Docseq object that was created earlier on which a second run of messages from mailboxes need to be added.

The rest of the input parameters indicate the mailboxes that should be indexed. These can either be just filenames, or URL's in the form: file://directory/mail.mbx or http://server/mail.mbx.

For more information, see the NexTrieve::Docseq module.

Resource

 $resource = $converter->Resource( | {method => value} );

The "Resource" method allows you to create a NexTrieve::Resource object from the internal structure of the NexTrieve::RFC822.pm object that lives inside of the NexTrieve::Mbox object.

For more information, see the documentation of the NexTrieve::RFC822 and NexTrieve::Resource modules itself.

RFC822

 $converter->RFC822( {method => value} ;
 $rfc822 = $converter->RFC822;

The "RFC822" method allows you to access the NexTrieve::RFC822 object that lives inside of the NexTrieve::Mbox object and which is created when the NexTrieve::Mbox object is created.

To facilitate access, a reference to a method-value pair hash can be specified as the input parameter.

For more information, see the documentation of the NexTrieve::RFC822 module itself.

OTHER METHODS

The following methods change aspects of the NexTrieve::Mbox object.

archive

 $converter->archive( $archive );
 $archive = $converter->archive;

Although the functionality of the NexTrieve::Mbox module is to just be a filter, the "archive" method allows you to do some message archive management with this module as well.

The input parameter specifies the name of the file to which all of the messages that are read (which could be from multiple mailbox or rfc822 files) are added at the end. Combined with the conceptualmailbox method and the baseoffset method, a basic email management system can be made.

baseoffset

 $converter->baseoffset( $offset | -e filename ? -s _ : 0 );
 $baseoffset = $converter->baseoffset;

The "baseoffset" method can only be used if the conceptualmailbox method is also used. It specifies the value of the "offset" attribute of the first message to be read from the mailbox. The value you typically specify is the size of the file in which the messages will eventually be stored, which you can e.g. specify with the archive method.

conceptualmailbox

 $converter->conceptualmailbox( filename );
 $conceptualmailbox = $converter->conceptualmailbox;

The "conceptualmailbox" method allows you to specify the value that should be saved in the "mailbox" attribute of all messages processed by this NexTrieve::Mbox object. When used with the archive method, it is usually the relative filename of the mailbox archive (where the archive filename is the absolute filename).

If a conceptual mailbox is specified, all messages being processed are considered to be part of the same (virtual) mailbox. This means that the offset attribute value is not reset when another mailbox is processed.

AUTHOR

Elizabeth Mattijsen, <liz@dijkmat.nl>.

Please report bugs to <perlbugs@dijkmat.nl>.

COPYRIGHT

Copyright (c) 1995-2002 Elizabeth Mattijsen <liz@dijkmat.nl>. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO

http://www.nextrieve.com, the NexTrieve.pm and the other NexTrieve::xxx modules.