The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

HTTP::Promise::Stream - Data Stream Encoding and Decoding

SYNOPSIS

    use HTTP::Promise::Stream;
    my $this = HTTP::Promise::Stream->new || 
        die( HTTP::Promise::Stream->error, "\n" );

VERSION

    v0.1.0

DESCRIPTION

HTTP::Promise::Stream serves to set a stream of data tha that optionally may need to be encoding or decoding, and read or write data from or to it that may also need to be compressed or decompressed.

Once those versatile parameters are set, one can use the class method to access or write the data and the necessary encoding or decoding is done automatically.

CONSTRUCTOR

new

Provided with a stream source, and some optional parameters and this will return a new HTTP::Promise::Stream object.

Currently supported stream sources are: scalar reference, glob and file path.

If an error occurred, this sets an error and returns undef

Supported parameters are:

  • decoding

    A string representing the encoding to use for decoding data. Currently supported encodings are: gzip, bzip2, deflate/inflate and zip

  • encoding

    A string representing the encoding to use for encoding data. Currently supported encodings are: gzip, bzip2, deflate/inflate and zip

METHODS

as_string

Returns the source stream as a string, or undef and an error occurred.

compress_params

Sets or gets an hash of parameters-value pairs to be used for the compression algorithm used.

decodable

Provided with a target and this returns an array object of decoders that are installed.

The target can be a string or an array reference of decoder names. If the target string all is specified, then, this will check all supported encodings. See "supported". If the target string browser is specified, then ths will check only the supported encodings that are also supported by web browsers. If no target is specified, it defaults to all.

If the target is an array reference, it will return the list of supported decoders in the order provided.

    my $all = $stream->decodable;
    # Same as above
    my $all = $stream->decodable( 'all' );
    my $all = $stream->decodable( 'browser' );
    my $all = $stream->decodable( [qw( gzip br deflate )] );
    # $all could contain gzip and br for example

Note that for most encoding, encoding and decoding is done by different classes.

decode

    $stream->decode( $data );
    $stream->decode( $data, { encoding => bzip2 } );
    $stream->decode( $data, { decoding => bzip2 } );
    my $decoded = $stream->decode;
    my $decoded = $stream->decode( { encoding => bzip2 } );
    my $decoded = $stream->decode( { decoding => bzip2 } );

This behaves in two different ways depending on the parameters provided:

1. with data provided

This will decode the data provided using the encoding specified and write the decoded data to the source stream.

2. without data provided

This will decode the source stream directly and return the data thus decoded.

This method will take the required encoding in the following order: from the decoding parameter, from the encoding parameter, or from "decoding" as set during object instantiation.

If the encoding specified is not supported this will return an error.

It returns true upon success, or sets an error and returns undef

decoding

This is a string. Sets or gets the encoding used for decoding. Supported encodings are: gzip, bzip2, inflate/deflate and zip

encodable

Provided with a target and this returns an array object of encoders that are installed.

The target can be a string or an array reference of decoder names. If the target string all is specified, then, this will check all supported encodings. See "supported". If the target string browser is specified, then ths will check only the supported encodings that are also supported by web browsers. If no target is specified, it defaults to all.

If the target is an array reference, it will return the list of supported encoders in the order provided.

    my $all = $stream->encodable;
    # Same as above
    my $all = $stream->encodable( 'all' );
    my $all = $stream->encodable( 'browser' );
    my $all = $stream->encodable( [qw( gzip br deflate )] );
    # $all could contain gzip and br for example

Note that for most encoding, encoding and decoding is done by different classes.

encode

    $stream->encode( $data );
    $stream->encode( $data, { encoding => bzip2 } );
    $stream->encode( $data, { decoding => bzip2 } );
    my $encoded = $stream->encode;
    my $encoded = $stream->encode( { encoding => bzip2 } );
    my $encoded = $stream->encode( { decoding => bzip2 } );

This is the alter ego of "decode"

This behaves in two different ways depending on the parameters provided:

1. with data provided

This will encode the data provided using the encoding specified and write the encoded data to the source stream.

2. without data provided

This will encode the source stream directly and return the data thus encoded.

This method will take the required encoding in the following order: from the encoding parameter, or from "encoding" as set during object instantiation.

If the encoding specified is not supported this will return an error.

It returns true upon success, or sets an error and returns undef

encoding

This is a string. Sets or gets the encoding used for encoding. Supported encodings are: gzip, bzip2, inflate/deflate and zip

encoding2suffix

Provided with a string of comma-separated encodings, or an array reference of encodings and this will return an array object of associated file extensions.

For example:

    my $a = HTTP::Promise::Stream->encoding2suffix( [qw( base64 gzip )] );
    # $a contains: b64 and gz

    my $a = HTTP::Promise::Stream->encoding2suffix( 'gzip' );
    # $a contains: gz

load

This attempts the load the specified encoding related class and returns true upon success or false otherwise.

It sets an error and returns undef upon error.

For example:

    if( HTTP::Promise::Stream->load( 'bzip2' ) )
    {
        my $s = HTTP::Promise::Stream->new( \$data, encoding => 'bzip2' );
        my $output = Module::Generic::Scalar->new;
        my $len = $s->read( $output, { Transparent => 0 } );
        die( $s->error ) if( !defined( $len ) );
        say "Ok, $len bytes were encoded.";
    }
    else
    {
        say "Encoder/decoder bzip2 related modules are not installed on this system.";
    }

See also "supported", which will tell you if HTTP::Promise::Stream even supports the specified encoding.

read

    $stream->read( $buffer );
    $stream->read( $buffer, $len );
    $stream->read( $buffer, $len, $offset );
    $stream->read( *buffer );
    $stream->read( *buffer, $len );
    $stream->read( sub{} );
    $stream->read( sub{}, $len );
    $stream->read( \$buffer );
    $stream->read( \$buffer, $len );
    $stream->read( \$buffer, $len, $offset );
    $stream->read( '/some/where/file.txt' );
    $stream->read( '/some/where/file.txt', $len );

Provided with some parameters, as detailed below, and this will either encode or decode the stream if any encoding was provided at all and into the read buffer specified.

Possible read buffers are:

  • scalar

  • scalar reference

  • file handle (glob)

  • subroutine reference or anonymous subroutine

  • file path

It takes as optional parameters the length of data, possibly encoded or decoded if any encoding was provided, and an optional offset. However, note that the offset argument is not used and ignored if the data buffer is not a string or a scalar reference.

Also you can specify an hash reference of options as the last parameter. Recognised options are:

  • autoflush

    Boolean value. If true, this will set the auto flush.

  • binmode

    The encoding to be used when opening the file specified, if one is specified. See "binmode"

  • mode

    The mode in which to open the file specified, if one is specified.

    Possible modes can be >, +>, >>, +<, w, w+, r+, a, a+, < and r or an integer representing a bitwise value such as O_APPEND, O_ASYNC, O_CREAT, O_DEFER, O_EXCL, O_NDELAY, O_NONBLOCK, O_SYNC, O_TRUNC, O_RDONLY, O_WRONLY, O_RDWR. For example: O_WRONLY|O_APPEND For that see Fcntl

  • other parameters starting with an uppercase letter

    Those parameters will be passed directly to the encoder/decoder.

        my $s = HTTP::Promise::Stream->new( \$data, decoding => 'inflate' );
        # Transparent and its value are passed directly to IO::Uncompress::Inflate
        $s->read( \$output, { Transparent => 0 } );

A typical recommended parameter used for the IO::Compress and IO::Uncompress families is Transparent set to 0, otherwise, the default is 1 and it would be lenient and any encoding/decoding issues with the data would be ignored.

For example, when using inflate to uncompress data compressed with deflate, some encoder do not format the data correctly, or declare it as deflate when they really meant rawdeflate, i.e. without the zlib headers and trailers. By default with Transparent set to 1, IO::Uncompress::Inflate will simply pass through the data. However, you are better of catching the error and resort to using rawinflate instead.

For example:

    use v5.17;
    use HTTP::Promise::Stream;
    my $data = '80jNyclXCM8vyklRBAA=';
    my $buff = '';
    my $s = HTTP::Promise::Stream->new( \$data, decoding => 'base64' ) ||
        die( HTTP::Promise::Stream->error );
    my $len = $s->read( \$buff );
    die( $s->error ) if( !defined( $len ) );
    
    say "Now inflating data.";
    $data = $buff;
    $buff = '';
    my $s = HTTP::Promise::Stream->new( \$data, decoding => 'deflate' ) ||
        die( HTTP::Promise::Stream->error );
    $len = $s->read( \$buff, { Transparent => 0 } );
    if( !defined( $len ) )
    {
        # Trying with rawinflate instead
        if( $s->error->message =~ /Header Error: CRC mismatch/ )
        {
            say "Found deflate encoding error (", $s->error->message, "), trying with rawinflate instead.";
            my $s = HTTP::Promise::Stream->new( \$data, decoding => 'rawdeflate' ) ||
                die( HTTP::Promise::Stream->error );
            $len = $s->read( \$buff, { Transparent => 0 } );
            die( $s->error ) if( !defined( $len ) );
        }
        else
        {
            die( $s->error );
        }
    }
    say $buff; # Hello world

source

Set or get the source stream.

suffix2encoding

Provided with a filename, and this will return an array object containing the encoding naes associated with the extensions found.

For example:

    my $a = HTTP::Promise::Stream->suffix2encoding( 'file.html.gz' );
    # $a contains: gzip

    my $a = HTTP::Promise::Stream->suffix2encoding( 'file.html' );
    # $a contains nothing

supported

Provided with an encoding name and this returns true if it is supported, or false otherwise.

Currently supported encodings are:

Base64

Supported natively. See HTTP::Promise::Stream::Base64

Brotli

Requires IO::Compress::Brotli for encoding and IO::Uncompress::Brotli for decoding.

See also caniuse

Bzip2

Requires IO::Compress::Bzip2 for encoding and IO::Uncompress::Bunzip2 for decoding.

Deflate and Inflate

Requires IO::Compress::Deflate for encoding and IO::Uncompress::Inflate for decoding.

This is the same as rawdeflate and rawinflate, except it has zlib headers and trailers.

See also its rfc1950, the Wikipedia page and Mozilla documentation about Content-Encoding

Note that some web server announce data encoded with deflate whereas they really mean rawdeflate, so you might want to use the Transparent parameter set to 0 when using "read"

Gzip

Requires IO::Compress::Gzip for encoding and IO::Uncompress::Gunzip for decoding.

See also caniuse

Lzf

This is Lempel-Ziv-Free compression.

Requires IO::Compress::Lzf for encoding and IO::Uncompress::UnLzf for decoding.

See Stackoverflow discussion

Lzip

Requires IO::Compress::Lzip for encoding and IO::Uncompress::UnLzip for decoding.

Lzma

Requires IO::Compress::Lzma for encoding and IO::Uncompress::UnLzma for decoding.

See Wikipedia page

Lzop

Requires IO::Compress::Lzop for encoding and IO::Uncompress::UnLzop for decoding.

"lzop is a file compressor which is very similar to gzip. lzop uses the LZO data compression library for compression services, and its main advantages over gzip are much higher compression and decompression speed (at the cost of some compression ratio)."

See the compressor home page and Wikipedia page

Lzw

This is Lempel-Ziv-Welch compression.

Requires Compress::LZW for encoding and for decoding.

A.k.a compress, this was used commonly until some corporation purchased the patent and started asking everyone for royalties. The patent expired in 2003. Gzip took over since then.

QuptedPrint

Requires the XS module MIME::QuotedPrint for encoding and decoding.

This encodes and decodes the quoted-printable data according to rfc2045, section 6.7

See also the Wikipedia page

Raw deflate

Requires IO::Compress::RawDeflate for encoding and IO::Uncompress::RawInflate for decoding.

This is the same as deflate and inflate, but without the zlib headers and trailers.

See also its rfc1951 and Mozilla documentation about Content-Encoding

UU encoding and decoding

Supported natively. See HTTP::Promise::Stream::UU

Xz

Requires IO::Compress::Xz for encoding and IO::Uncompress::UnXz for decoding.

Reportedly, "xz achieves higher compression rates than alternatives like gzip and bzip2. Decompression speed is higher than bzip2, but lower than gzip. Compression can be much slower than gzip, and is slower than bzip2 for high levels of compression, and is most useful when a compressed file will be used many times."

See compressor home page and Wikipedia page

Zip

Requires IO::Compress::Zip for encoding and IO::Uncompress::Unzip for decoding.

Zstd

Requires IO::Compress::Zstd for encoding and IO::Uncompress::UnZstd for decoding.

See rfc8878 and Wikipedia page

See also "load", which will tell you if the specified encoding related modules are installed on the system or not.

write

    $stream->write( $data );
    $stream->write( \$data );
    $stream->write( *$data );
    $stream->write( '/some/where/file.txt' );
    $stream->write( sub{} );

Provided with some data, and this will read the data provided, and write it, possibly encoded or decoded, depending on whether a decoding or encoding was provided, to the stream source.

It returns the amount of bytes written to the source stream, but before any possible encoding or decoding.

The data that can be provided are:

  • string

    Note that the difference between a file and a string is slim. To distinguish the two, this method will treat as a string any value that is not a reference and that either contains a CRLF sequence, or that does not contain a CRLF sequence and is not an existing file.

  • scalar reference

  • file handle (glob)

  • file path

    Note that the difference between a file and a string is slim. So to distinguish the two, this method will treat as a file a value that has no CRLR sequence

  • code reference (anonymous subroutine or subroutine reference)

    It will be called once and expect data in return. If the code executed dies, the exception will be trapped using try-catch block from Nice::Try

The behaviour is different depending on the source type and the data type being provided. Below is an in-depth explanation:

1. Source stream is a code reference
1.1 Data is to be encoded

Data is encoded with "encode", then the source code reference is executed, passing it the encoded data

1.2 Data is to be decoded

Data is decoded with "decode", then the source code reference is executed, passing it the decoded data

1.3 Data is scalar reference

The source code reference is executed, passing it the content of the scalar reference

1.4 Data is a glob

The file handle is read by chunks of 10Kb (10240 bytes) and each time the source code reference is called passing it the data chunk read.

1.5 Data is a file path

The file is opened in read mode, and all its content is provided in one pass to the source code reference.

2. Data is the be encoded

The appropriate encoder is called to encode the data and write to the source stream.

3. Data is to be decoded

The appropriate decoder is called to decode the data and write to the source stream.

4. Source stream is a scalar reference
4.1 Data is a scalar reference

The provided data is simply appended to the source stream.

4.2 Data is a glob

The file handle is read by chunks of 10Kb (10240 bytes) and appended to the source stream.

4.3 Data is a file path

The file is opened in read mode and its content appended to the source stream.

5. Source stream is a glob
5.1 Data is a scalar reference

The file handle of the source stream is called with "print" and the data is printed to it.

5.2 Data is a glob

The data file handle is read by chunks of 10Kb (10240 bytes) and each one printed to the source stream file handle.

5.3 Data is a file path

The given file path is read in read mode and each chunk of 10Kb (10240 bytes) read is printed to the source stream file handle.

6. Source stream is a file path

The source file is opened in write clobbering mode.

6.1 Data is a scalar reference

The data is printed to the source stream

6.2 Data is a glob

Data from the glob is read by chunks of 10Kb (10240 bytes) and each one printed to the source stream

6.3 Data is a file path.

The file is opened in read mode and its content is read by chunks o 10Kb (10240 bytes) and each chunk printed to the source stream.

AUTHOR

Jacques Deguest <jack@deguest.jp>

SEE ALSO

Mozilla documentation, Content-Encoding documentation

Wikipedia page

PerlIO::via::gzip, PerlIO::via::Bzip2, PerlIO::via::Base64, PerlIO::via::QuotedPrint, PerlIO::via::xz

HTTP::Promise, HTTP::Promise::Request, HTTP::Promise::Response, HTTP::Promise::Message, HTTP::Promise::Entity, HTTP::Promise::Headers, HTTP::Promise::Body, HTTP::Promise::Body::Form, HTTP::Promise::Body::Form::Data, HTTP::Promise::Body::Form::Field, HTTP::Promise::Status, HTTP::Promise::MIME, HTTP::Promise::Parser, HTTP::Promise::IO, HTTP::Promise::Stream, HTTP::Promise::Exception

COPYRIGHT & LICENSE

Copyright(c) 2022 DEGUEST Pte. Ltd.

All rights reserved This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.