The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Image::ExifTool::MIE - Read/write MIE meta information

SYNOPSIS

This module is used by Image::ExifTool

DESCRIPTION

This module contains routines required by Image::ExifTool to read and write information in MIE files.

WHAT IS MIE?

MIE stands for Meta Information Encapsulation. The MIE format is an extensible, dedicated meta information format which supports storage of binary as well as textual meta information. MIE can be used to encapsulate meta information from many sources and bundle it together with any type of file.

Features

Below is very subjective score card comparing the features of a number of common file and meta information formats, and comparing them to MIE. The following features are rated for each format with a score of 0 to 10:

  1) Extensible (can incorporate user-defined information).
  2) Tag ID's meaningful (hints to meaning of unknown information).
  3) Sequential read/written ability (streamable).
  4) Hierarchical information structure.
  5) Easy to implement reader/writer/editor.
  6) Data order well defined.
  7) Large data lengths supported: >64kB (+5) and >4GB (+5).
  8) Localized text strings.
  9) Multiple documents in a single file.
 10) Compact format doesn't squander disk space or bandwidth.
 11) Compressed meta information supported.
 12) Relocatable data elements.
 13) Binary meta information (+7) with variable byte order (+3).
 14) Mandatory tags not required (because that would be stupid).
 15) Append information to end of file without editing.

                          Feature number                   Total
     Format  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15   Score
     ------ ---------------------------------------------  -----
     MIE    10 10 10 10 10 10 10 10 10 10 10 10 10 10 10    150
     PDF    10 10  0 10  0  0 10  0 10 10 10  0  7 10 10     97
     PNG    10 10 10  0  8  0  5 10  0 10 10 10  0 10  0     93
     XMP    10 10 10 10  2  0 10 10 10  0  0 10  0 10  0     92
     AIFF    0  5 10 10 10  0  5  0  0 10  0 10  7 10  0     77
     RIFF    0  5 10 10 10  0  5  0  0 10  0 10  7 10  0     77
     JPEG   10  0 10  0 10  0  0  0  0 10  0 10  7 10  0     67
     EPS    10 10 10  0  0  0 10  0 10  0  0  5  0 10  0     65
     TIFF    0  0  0 10  5 10  5  0 10 10  0  0 10  0  0     60
     EXIF    0  0  0 10  5 10  0  0  0 10  0  0 10  0  0     45
     IPTC    0  0 10  0  8  0  0  0  0 10  0 10  7  0  0     45

By design, MIE ranks highest by a significant margin. Other formats with reasonable scores are PDF, PNG and XMP, but each has significant weak points. What may be surprising is that TIFF, EXIF and IPTC rank so low.

As well as scoring high in all these features, the MIE format has the unique ability to encapsulate any other type of file, and provides a non-invasive method of adding meta information to a file. The meta information is logically separated from the original file data, which is extremely important because meta information is routinely lost when files are edited.

Also, the MIE format supports multiple files by simple concatination, enabling all kinds of wonderful features such as linear databases, edit histories or non-intrusive file updates.

MIE FORMAT SPECIFICATION

NOTE: The MIE format specification is currently under development. Until version 1.00 is released these specifications may be subject to changes which may not be backwardly compatible.

File Structure

A MIE file consists of a series of MIE elements. A MIE element may contain either data or a group of MIE elements, providing a hierarchical format for storing data. Each MIE element is identified by a human-readable tag name, and may store data from zero to 2^64-1 bytes in length.

File Signature

The first element in the MIE file must be an uncompressed MIE group element with a tag name of "0MIE". This restriction allows the first 8 bytes of a MIE file to be used to identify a MIE format file. The following tables list these byte sequences for big-endian and little-endian MIE-format files:

    Byte Number:      0    1    2    3    4    5    6    7

    C Characters:     ~ \x10 \x04    ?    0    M    I    E
        or            ~ \x18 \x04    ?    0    M    I    E

    Hexadecimal:     7e   10   04    ?   30   4d   49   45
        or           7e   18   04    ?   30   4d   49   45

    Decimal:        126   16    4    ?   48   77   73   69
        or          126   24    4    ?   48   77   73   69

Note that byte 2 may have one of the two possible values (0x10 or 0x18), and byte 3 may have any value (0x00 to 0xff).

Element Structure

    1 byte  SyncByte = 0x7e (decimal 126, character '~')
    1 byte  FormatCode (see below)
    1 byte  TagLength (T)
    1 byte  DataLength (gives D if DataLength < 253)
    T bytes TagName (T given by TagLength)
    2 bytes DataLength2 [exists only if DataLength == 255]
    4 bytes DataLength4 [exists only if DataLength == 254]
    8 bytes DataLength8 [exists only if DataLength == 253]
    D bytes DataBlock (D given by DataLength)

The minimum element length is 4 bytes (for a group terminator). The maximum DataBlock size is 2^64-1 bytes. TagLength and DataLength are unsigned integers, and the byte ordering for multi-byte DataLength fields is specified by the containing MIE group element. The SyncByte is byte aligned, so no padding is added to align on an N-byte boundary.

FormatCode

The format code is a bitmask that defines the format of the data:

    7654 3210
    ++++ ----  FormatType
    ---- +---  TypeModifier
    ---- -+--  Compressed
    ---- --++  FormatSize

    FormatType (bitmask 0xf0):

        0x00 - other (unknown) format data
        0x10 - MIE group
        0x20 - text string
        0x30 - list of null-separated text strings
        0x40 - integer
        0x50 - rational
        0x60 - fixed point
        0x70 - floating point
        0x80 - free space

    TypeModifier (bitmask 0x08):

    Modifies the meaning of certain FormatTypes (0x00-0x50):

        0x08 - data may be byte swapped according to FormatSize
        0x18 - MIE group with little-endian byte ordering
        0x28 - UTF encoded text string
        0x38 - UTF encoded text string list
        0x48 - signed integer
        0x58 - signed rational (denominator is always unsigned)
        0x68 - signed fixed-point

    Compressed (bitmask 0x04):

    If this bit is set, the data block is compressed using Zlib deflate. An entire MIE group may be compressed, with the exception of file-level groups.

    FormatSize (bitmask 0x03):

    Gives the byte size of each data element:

        0x00 - 8 bits  (1 byte)
        0x01 - 16 bits (2 bytes)
        0x02 - 32 bits (4 bytes)
        0x03 - 64 bits (8 bytes)

    The number of bytes in a single value for this format is given by 2**FormatSize (or 1 << FormatSize). The number of values is the data length divided by this number of bytes. It is an error if the data length is not an even multiple of the format size in bytes.

The following is a list of all currently defined MIE FormatCode values for uncompressed data (add 0x04 to each value for compressed data):

    0x00 - unknown data (byte order must be preserved)
    0x08 - other 8-bit data (not affected by byte swapping)
    0x09 - other 16-bit data (may be byte swapped)
    0x0a - other 32-bit data (may be byte swapped)
    0x0b - other 64-bit data (may be byte swapped)
    0x10 - MIE group with big-endian values (1)
    0x18 - MIE group with little-endian values (1)
    0x20 - ASCII string (2,3)
    0x28 - UTF-8 string (2,3)
    0x29 - UTF-16 string (2,3)
    0x2a - UTF-32 string (2,3)
    0x30 - ASCII string list (2,4)
    0x38 - UTF-8 string list (2,4)
    0x39 - UTF-16 string list (2,4)
    0x3a - UTF-32 string list (2,4)
    0x40 - unsigned 8-bit integer
    0x41 - unsigned 16-bit integer
    0x42 - unsigned 32-bit integer
    0x43 - unsigned 64-bit integer (5)
    0x48 - signed 8-bit integer
    0x49 - signed 16-bit integer
    0x4a - signed 32-bit integer
    0x4b - signed 64-bit integer (5)
    0x52 - unsigned 32-bit rational (16-bit numerator then denominator) (6)
    0x53 - unsigned 64-bit rational (32-bit numerator then denominator) (6)
    0x5a - signed 32-bit rational (denominator is unsigned) (6)
    0x5b - signed 64-bit rational (denominator is unsigned) (6)
    0x61 - unsigned 16-bit fixed-point (high 8 bits is integer part) (7)
    0x62 - unsigned 32-bit fixed-point (high 16 bits is integer part) (7)
    0x69 - signed 16-bit fixed-point (high 8 bits is signed integer) (7)
    0x6a - signed 32-bit fixed-point (high 16 bits is signed integer) (7)
    0x72 - 32-bit IEEE float (not recommended for portability reasons)
    0x73 - 64-bit IEEE double (not recommended for portability reasons) (5)
    0x80 - free space (value data does not contain useful information)

 1) The byte ordering specified by the MIE group TypeModifier applies to the
    MIE group element as well as all elements in the group.

 2) The TagName of a string element may have an 6-character suffix to
    indicate a specific locale. (ie. "Title-en_US", or "Keywords-de_DE").

 3) Text strings are not normally null terminated, however they may be
    padded with one or more null characters to the end of the data block to
    allow strings to be edited within fixed-length data blocks.

 4) A list of text strings separated by null characters.  These lists must
    not be null padded or null terminated, since this would be interpreted
    as additional zero-length strings.  For ASCII and UTF-8 strings, the
    null character is a single zero (0x00) byte.  For UTF-16 or UTF-32
    strings, the null character is 2 or 4 zero bytes respectively.

 5) 64-bit integers and doubles are subject to the specified byte ordering
    for both 32-bit words and bytes within these words.  For instance, the
    high order byte is always the first byte if big-endian, and the eighth
    byte if little-endian.  This means that some swapping is always
    necessary for these values on systems where the byte order differs from
    the word order (ie. some ARM systems), regardless of the endian-ness of
    the stored values.

 6) Rational values are treated as two separate integers.  The numerator
    always comes first regardless of the byte ordering.

 7) 32-bit fixed point values are converted to floating point by treating
    them as an integer and dividing by an appropriate value.  ie)

        16-bit fixed value = 16-bit integer value / 256.0
        32-bit fixed value = 32-bit integer value / 65536.0

TagLength

Gives the length of the TagName string. Any value between 0 and 255 is valid, but the TagLength of 0 is valid only for the MIE group terminator.

DataLength

DataLength is an unsigned byte that gives the number of bytes in the data block. A value between 0 and 252 gives the data length directly, and numbers from 253 to 255 are reserved for special codes. Codes of 255, 254 and 253 indicate that the element contains an additional 2, 4 or 8 byte unsigned integer representing the data length.

    0-252 = length of data block
    255   = use DataLength2
    254   = use DataLength4
    253   = use DataLength8

A DataLength of zero is valid for any element except a compressed MIE group. A zero DataLength for an uncompressed MIE group indicates that the group length is unknown. For other elements, a zero length indicates there is no associated data.

TagName

The TagName string is 0 to 255 bytes long, and is composed of the ASCII characters A-Z, a-z, 0-9 and underline ('_'). Also, a dash ('-') is used to separate the language/country code in the TagName of a localized text string. The TagName string is NOT null terminated. A MIE element with a tag string of zero length is reserved for the group terminator.

MIE elements are sorted alphabetically by TagName within each group. Multiple elements with the same TagName are allowed, even within the same group.

Tag names for localized text strings have an 6-character suffix with the following format: The first character is a dash ('-'), followed by a 2-character lower case ISO 639-1 language code, then an underline ('_'), and ending with a 2-character upper case ISO 3166-1 alpha 2 country code. (ie. "-en_US", "-en_GB", "-de_DE" or "-fr_FR". Note that "GB", and not "UK" is the code for Great Britain, although "UK" should be recognized for compatiblity reasons.) The suffix is included when sorting the tags alphabetically, so the default locale (with no tag-name suffix) always comes first. If the country is unknown or not applicable, a country code of "XX" should be used.

TagNames should be meaningful. Words should be lowercase with an uppercase first character, and acronyms should be all upper case. The underline ("_") is provided to allow separation of two acronyms or two numbers, but it shouldn't be used unless necessary. No separation is necessary between an acronym and a word (ie. "ISOSetting").

All TagNames should start with an uppercase letter. An exception to this rule allows tags to begin with a digit (0-9) if they must come before other tags in the sort order, or a lowercase letter (a-z) if they must come after. For instance, the '0Type' element begins with a digit so it comes before, and the 'data' element begins with a lowercase letter so that it comes after meta information tags in the main '0MIE' group.

Sets of tags which would require a common prefix should be added in a separate MIE instead of adding the prefix to all tag names. For example, instead of these TagName's:

    ExternalFlashType
    ExternalFlashSerialNumber
    ExternalFlashFired

one would instead designate a separate "ExternalFlash" MIE group to contain the following elements:

    Type
    SerialNumber
    Fired

DataLength2/4/8

These extended DataLength fields exist only if DataLength is 255, 254 or 253, and are respectively 2, 4 or 8 byte unsigned integers giving the data block length. One of these values must be used if the data block is larger than 252 bytes, but they may be used if desired for smaller blocks too (although this may add a few unecessary bytes to the MIE element).

DataBlock

The data for the MIE element. The format of the data is given by the FormatCode. For MIE group elements, the data includes all contained elements and the group terminator.

MIE groups

All MIE data elements must be contained within a group. A group begins with a MIE group element, and ends with a group terminator. Groups may be nested in a hierarchy to arbitrary depth.

A MIE group element is identified by a format code of 0x10 (big endian byte ordering) or 0x18 (little endian). The group terminator is distinguished by a zero TagLength (it is the only element allowed to have a zero TagLength), and has a FormatCode of 0x00.

The MIE group element is permitted to have a zero DataLength only if the data is uncompressed. This special value indicates that the group length is unknown (otherwise the minimum value for DataLength is 4, corresponding the the minimum group size which includes a terminator of at least 4 bytes). If DataLength is zero, all elements in the group must be parsed until the group terminator is found. If non-zero, DataLength includes the length of all elements contained within the group, including the group terminator. Use of a non-zero DataLength is encouraged because it allows readers quickly skip over entire MIE groups. For compressed groups DataLength must be non-zero, and is the length of the compressed group data (which includes the compressed group terminator).

The group terminator has a FormatCode and TagLength of zero. Terminators usually also have a DataLength of zero. Hence, the byte sequence for a terminator is commonly 7e 00 00 00 (hex). However, the terminator may also have a DataLength of 6 or 10 bytes, and an associated data block containing information about the length and byte ordering of the preceeding group. This additional information is recommended for file-level groups, and is used in multi-document MIE files to allow the file to be scanned backwards to quickly locate the last documents in the file, and may also allow some documents to be recovered if part of the file is corrupted. The structure of this optional terminator data block is as follows:

    4 or 8 bytes  GroupLength (unsigned integer)
    1 byte        FormatCode (0x10 or 0x18, same as MIE group element)
    1 byte        GroupLengthSize (0x04 or 0x08)

The FormatCode and GroupLengthSize give the byte ordering and number of bytes in the GroupLength integer. The GroupLength gives the total length of the group ending with this terminator, including the lengths of the MIE group element and the terminator itself.

File-level MIE groups

File-level MIE groups may NOT be compressed.

All elements in a MIE file are contained within a special group with a TagName of "0MIE". The purpose of the "OMIE" group is to provide a unique signature at the start of the file, and to encapsulate information allowing files to be easily combined. The "0MIE" group must be terminated like any other group, but it is recommended that the terminator of a file-level group include the optional data block (defined above) to provide information about the group length and byte order.

It is valid to have more than one "0MIE" group at the file level, allowing multiple documents in a single MIE file. Furthermore, the MIE structure enables multi-document files to be generated by simply concatinating two or more MIE files.

Scanning Backwards through a MIE File

The steps below give an algorithm to quickly locate the last document in a MIE file:

1) Read the last 10 bytes of the file. A valid MIE file must be a minimum of 12 bytes long.

2) If the last byte of the file is zero, then it is not possible to scan backward through the file, so the file must be scanned from the beginning. Otherwise, proceed to the next step.

3) If the last byte is 4 or 8, the terminator contains information about the byte ordering and length of the group. Otherwise, stop here because this isn't a valid MIE file.

4) The next-to-last byte must be either 0x10 indicating big-endian byte ordering or 0x18 for little-endian ordering, otherwise this isn't a valid MIE file.

5) The preceeding 4 or 8 bytes give the length of the complete file-level MIE group, including the leading MIE group element and the terminator element. The value is an unsigned integer stored with the specified byte order. From the current file position (at the end of the 10 bytes we read in step 1), seek backward by this number of bytes to find the start of the MIE group element for this document.

This algorithm may be repeated again beginning at this point in the file to locate the next-to-last document, etc.

The table below lists all 5 valid patterns for the last 10 bytes of a file-level MIE group (numbers in hex):

  ?? ?? ?? ?? ?? ?? ?? ?? 00 00  - can not seek backwards
  ?? ?? ?? ?? GG GG GG GG 10 04  - 4 byte group length (G), big endian
  ?? ?? ?? ?? GG GG GG GG 18 04  - 4 byte group length (G), little endian
  GG GG GG GG GG GG GG GG 10 08  - 8 byte group length (G), big endian
  GG GG GG GG GG GG GG GG 18 08  - 8 byte group length (G), little endian

MIE Date/Time Format

All MIE dates are the form "YYYY:mm:dd HH:MM:SS+HH:MM". The timezone is recommended but not required.

MIE File MIME Type

The basic MIME type for a MIE file is "application/x-mie", however the specific MIME type depends on the type of subfile, and is obtained by adding "x-mie-" to the MIME type of the subfile. For example, with a subfile of type "image/jpeg", the MIE file MIME type is "image/x-mie-jpeg". But note that the "x-" is not duplicated if the subfile MIME type already starts with "x-". So a subfile with MIME type "image/x-raw" is contained within a MIE file of type "image/x-mie-raw", not "image/x-mie-x-raw". In the case of multiple documents in a MIE file, the MIME type is taken from the first document.

AUTHOR

Copyright 2003-2006, Phil Harvey (phil at owl.phy.queensu.ca)

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. The MIE format itself is also copyright Phil Harvey, and is covered by the same free-use license.

SEE ALSO

"MIE Tags" in Image::ExifTool::TagNames, Image::ExifTool(3pm)