The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

MIME::Structure - determine structure of MIME messages

SYNOPSIS

    use MIME::Structure;
    $parser = MIME::Structure->new;
    $message = $parser->parse($filehandle);
    print $message->{'header'};
    $parts = $message->{'parts'};
    foreach ($parts) {
        $offset  = $_->{'offset'};
        $type    = $_->{'type'};
        $subtype = $_->{'subtype'};
        $line    = $_->{'line'};
        $header  = $_->{'header'};
    }
    print $parser->concise_structure($message), "\n";

METHODS

new
    $parser = MIME::Structure->new;
parse
    $message = $parser->parse($filehandle);
    ($message, @other_entities) = $parser->parse($filehandle);

Parses the message found in the given filehandle.

A MIME message takes the form of a non-empty tree, each of whose nodes is termed an entity (see RFCs 2045-2049). The root entity is the message itself; the children of a multipart message are the parts it contains. (A non-multipart message has no children.)

When called in list context, the parse method returns a list of references to hashes; each hash contains information about a single entity in the message.

The first hash represents the message itself; if it is a multipart message, subsequent entities are its parts and subparts in the order in which they occur in the message -- in other words, in pre-order. If called in scalar context, only a reference to the hash containing information about the message itself is returned.

The following elements may appear in these hashes:

body_offset

The offset, in bytes, of the entity's body.

content_length

The length, in bytes, of the entity's body. Currently only set for the message itself.

encoding

The value of the entity's Content-Transfer-Encoding field.

fields

If the keep_fields option is set, this will be a reference to a hash whose keys are the names (converted to lower case) are the names of all fields present in the entity;s header and whose values xxx.

The entity's full header as it appeared in the message, not including the final blank line. This will be presently only if the keep_header option is set.

kind

message if the entity is the message, or part if it is a part within a message (or within another part).

length

The length, in bytes, of the entire entity, including its header and body. Currently only set for the message itself.

level

The level at which the entity is found. The message itself is at level 0, its parts (if any) are at level 1, their parts are at level 2, and so on.

line

The line number (1-based) of the first line of the message's header. The message itself always, by definition, is at line 1.

number

A dotted-decimal notation that indicates the entity's place within the message. The root entity (the message itself) has number 1; its parts (if it has any any) are numbered 1.1, 1.2, 1.3, etc., and the numbers of their parts in turn (if they have any) are constructed in like manner.

offset

The offset in bytes of the first line of the entity's header, measured from the first line of the message's header. The message itself always, by definition, is at offset 0.

parent

A reference to the hash representing the entity's parent. If the entity is the message itself, this is undefined.

parts

A reference to an array of the entity's parts. This will be present only if the entity is of type multipart.

parts_boundary

The string used as a boundary to delimit the entity's parts. Present only in multipart entities.

subtype

The MIME media subtype of the entity's content, e.g., plain or jpeg.

type

The MIME media type of the entity's content, e.g., text or image.

type_params

A reference to a hash containing the attributes (if any) found in the Content-Type: header field. For example, given the following Content-Type header:

    Content-Type: text/html; charset=UTF-8

The entity's type_params element will be this:

    $entity{'type_params'} = {
        'charset' => 'UTF-8',
    }

Besides parsing the message, this method may also be used to print the message, or portions thereof, as it parses; the print method (q.v.) may be used to specify what to print.

keep_header
    $keep_header = $parser->keep_header;
    $parser->keep_header(1);

Set (or get) whether headers should be remembered during parsing.

keep_fields

Set (or get) whether fields (normalized headers) should be remembered.

print
    $print = $parser->print;
    $parser->print($MIME::Structure::PRINT_HEADER | $MIME::Structure::PRINT_BODY);
    $parser->print('header,body');

Set (or get) what should be printed. This may be specified either as any of the following symbolic constants, ORed together:

Or using the following string constants concatenated using any delimiter:

none
header
body
preamble
epilogue
    $print_header = $parser->print_header;
    $parser->print_header(1);

Set (or get) whether headers should be printed.

    $print_body = $parser->print_body;
    $parser->print_body(1);

Set (or get) whether bodies should be printed.

    $print_preamble = $parser->print_preamble;
    $parser->print_preamble(1);

Set (or get) whether preambles should be printed.

    $print_epilogue = $parser->print_epilogue;
    $parser->print_epilogue(1);

Set (or get) whether epilogues should be printed.

entities
    $parser->parse;
    print "$_->{type}/$_->{subtype} $_->{offset}\n"
        for @{ $parser->entities };

Returns a reference to an array of all the entities in a message, in the order in which they occur in the message. Thus the first entity is always the root entity, i.e., the message itself).

concise_structure
    $parser->parse;
    print $parser->concise_structure;
    # e.g., '(multipart/alternative:0 (text/html:291) (text/plain:9044))'

Returns a string showing the structure of a message, including the content type and offset of each entity (i.e., the message and [if it's multipart] all of its parts, recursively). Each entity is printed in the form:

    "(" content-type ":" byte-offset [ " " parts... ")"

Offsets are byte offsets of the entity's header from the beginning of the message. (If parse() was called with an offset parameter, this is added to the offset of the entity's header.)

N.B.: The first offset is always 0.

BUGS

Documentation is sketchy.

AUTHOR

Paul Hoffman <nkuitse (at) cpan (dot) org>

COPYRIGHT

Copyright 2008 Paul M. Hoffman. All rights reserved.

This program is free software; you can redistribute it and modify it under the same terms as Perl itself.