The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Parse::Matroska::Reader

VERSION

version 0.001

SYNOPSIS

    use Parse::Matroska::Reader;
    my $reader = Parse::Matroska::Reader->new($path);
    $reader->close;
    $reader->open(\$string_with_matroska_data);

    my $elem = $reader->read_element;
    print "Element ID: $elem->{elid}\n";
    print "Element name: $elem->{name}\n";
    if ($elem->{type} ne 'sub') {
        print "Element value: $elem->get_value\n";
    } else {
        while (my $child = $elem->next_child) {
            print "Child element: $child->{name}\n";
        }
    }
    $reader->close;

DESCRIPTION

Reads EBML data, which is used in Matroska files. This is a low-level reader which is meant to be used as a backend for higher level readers. TODO: write the high level readers :)

NOTE

The API of this module is not yet considered stable.

METHODS

new

Creates a new reader. Calls "open(arg)" with its arguments if provided.

open(arg)

Creates the internal filehandle. The argument can be:

  • An open filehandle or IO::Handle object. The filehandle is not dup()ed, so calling "close" in this object will close the given filehandle as well.

  • A scalar containing a path to a file.

  • On perl v5.14 or newer, a scalarref pointing to EBML data. For similar functionality in older perls, give an IO::String object.

close

Closes the internal filehandle.

readlen(length)

Reads length bytes from the internal filehandle.

read_id

Reads an EBML ID atom in hexadecimal string format, suitable for passing to "elem_by_hexid" in Parse::Matroska::Definitions.

read_size

Reads an EBML Data Size atom, which immediately follows an EBML ID atom.

This returns an array consisting of:

0 The length of the Data Size atom.
1 The value encoded in the Data Size atom, which is the length of all the data following it.
read_str(length)

Reads a string of length length bytes from the internal filehandle. The string is already "decode" in Encoded from UTF-8, which is the standard Matroska string encoding.

read_uint(length)

Reads an unsigned integer of length length bytes from the internal filehandle.

Returns a Math::BigInt object if length is greater than 4.

read_sint(length)

Reads a signed integer of length length bytes from the internal filehandle.

Returns a Math::BigInt object if length is greater than 4.

read_float(length)

Reads an IEEE floating point number of length length bytes from the internal filehandle.

Only lengths '4' and '8' are supported (C 'float' and 'double').

read_ebml_id(length)

Reads an EBML ID when it's encoded as the data inside another EBML element, that is, when the enclosing element's type is "ebml_id".

This returns a hashref with the EBML element description as defined in Parse::Matroska::Definitions.

skip(length)

Skips length bytes in the internal filehandle.

getpos

Wrapper for "$io->getpos" in IO::Seekable in the internal filehandle.

Returns undef if the internal filehandle can't getpos.

setpos(pos)

Wrapper for "$io->setpos" in IO::Seekable in the internal filehandle.

Returns undef if the internal filehandle can't setpos.

Croaks if setpos does not seek to the requested position, that is, if calling getpos does not yield the same object as the pos argument.

read_element(read_bin)

Reads a full EBML element from the internal filehandle.

Returns a Parse::Matroska::Element initialized with the read data. If read_bin is not present or is false, will delay-load the contents of 'binary' type elements, that is, they will only be loaded when calling get_value on the returned Parse::Matroska::Element object.

Does not read the children of the element if its type is 'sub'. Look into the Parse::Matroska::Element interface for details in how to read children elements.

Pass a true read_bin if the stream being read is not seekable (getpos is undef) and the contents of 'binary' elements is desired, otherwise seeking errors or internal filehandle corruption might occur.

CAVEATS

Children elements have to be processed as soon as an element with children is found, or their children ignored with "skip" in Parse::Matroska::Element. Not doing so doesn't cause errors but results in an invalid structure, with constant '0' depth.

To work correctly in unseekable streams, either the contents of 'binary'-type elements has to be ignored or the read_bin flag to read_element has to be true.

AUTHOR

Diogo Franco <diogomfranco@gmail.com>, aka Kovensky. Initially based on a python script by Uoti Urpala.

SEE ALSO

Parse::Matroska::Definitions, Parse::Matroska::Element.

LICENSE

The FreeBSD license, equivalent to the ISC license.