NAME

ClickHouse::Encoder::TCP - Pack/unpack a useful subset of the ClickHouse native TCP protocol

SYNOPSIS

use IO::Socket::INET;
use ClickHouse::Encoder;
use ClickHouse::Encoder::TCP;

my $s = IO::Socket::INET->new(PeerAddr => 'db:9000') or die;
binmode $s;

# A caller-owned buffer threaded through every read_packet call -
# required when reading more than one packet, since a single
# sysread can pull in several packets at once.
my $rbuf = '';

# 1. Handshake
print $s ClickHouse::Encoder::TCP->pack_hello(
    user => 'default', password => '', database => 'default');
my $hello = ClickHouse::Encoder::TCP->read_packet($s, buffer => \$rbuf);
die "expected Hello, got type $hello->{type}\n"
    unless $hello->{type} == ClickHouse::Encoder::TCP::SERVER_HELLO;

# 2. insert query
print $s ClickHouse::Encoder::TCP->pack_query(
    query => 'insert into events format native');

# 3. Read TableColumns / empty Data packets the server sends back
while (1) {
    my $p = ClickHouse::Encoder::TCP->read_packet($s, buffer => \$rbuf);
    last if $p->{type} == ClickHouse::Encoder::TCP::SERVER_DATA;
}

# 4. Send our data block(s)
my $enc = ClickHouse::Encoder->new(columns => [
    ['ev', 'String'], ['ts', 'DateTime']]);
my $block = $enc->encode([ ['login', time()] ]);
print $s ClickHouse::Encoder::TCP->pack_data($block);

# 5. End of insert
print $s ClickHouse::Encoder::TCP->pack_data_end();

# 6. Wait for EndOfStream / Exception
while (1) {
    my $p = ClickHouse::Encoder::TCP->read_packet($s, buffer => \$rbuf);
    last if $p->{type} == ClickHouse::Encoder::TCP::SERVER_END_OF_STREAM;
    die "server exception: $p->{message}\n"
        if $p->{type} == ClickHouse::Encoder::TCP::SERVER_EXCEPTION;
}
close $s;

DESCRIPTION

A pure-Perl helper module that packs the few client packets needed to drive an insert pipeline over the ClickHouse native TCP protocol (port 9000), plus a decoder for the most common server packets: Hello, Data, Exception, Progress, Pong, EndOfStream, ProfileInfo, TableColumns, ProfileEvents.

Targets protocol revision 54429 - the same revision ClickHouse clients have used since ~2020. Newer fields the server may include (timezone, display_name, version_patch) are read opportunistically when present.

Transport (socket / TLS / framing) is the caller's responsibility. "read_packet" is provided as a convenience for blocking IO::Socket-style use; for non-blocking transports, call "unpack_packet" on a sliding byte buffer directly.

Out of scope:

  • Settings with typed values (newer flexible-setting wire form).

  • select result streaming (covers what's needed for inserts).

  • Server's prepared-query parameters protocol.

Wire compression is supported as an opt-in: "pack_data" / "pack_data_end" accept compress => 'lz4' or 'zstd', and "read_packet" accepts compressed => 1. See CAVEATS for the negotiation handshake the caller must perform first.

For select, prefer HTTP - it's simpler and well-supported by ClickHouse::Encoder's decode_block / decode_stream.

PACKET ENCODERS

pack_hello %opts

my $bytes = ClickHouse::Encoder::TCP->pack_hello(
    client_name => 'my-app',  # default 'ClickHouse::Encoder'
    major       => 1,          # default 1
    minor       => 0,          # default 0
    revision    => 54429,      # default DEFAULT_REVISION
    database    => 'default',
    user        => 'default',
    password    => '',
);

pack_query %opts

my $bytes = ClickHouse::Encoder::TCP->pack_query(
    query    => 'insert into t format native',
    query_id => '',                       # let server generate
    settings => { max_memory_usage => '1000000000' },  # optional
    stage    => STAGE_COMPLETE,           # default
    compression => COMPRESSION_DISABLE,   # default
);

settings may be a hashref (legacy string-value form) or a raw byte string already in the right shape.

pack_data $block_bytes, %opts

my $bytes = ClickHouse::Encoder::TCP->pack_data($block);
my $bytes = ClickHouse::Encoder::TCP->pack_data($block,
    compress => 'lz4');   # or 'zstd'

Wraps an encoded Native block (from "encode" in ClickHouse::Encoder) in a Data packet with an optional table_name (usually empty for inserts).

With compress => 'lz4' (or 'zstd', or 'auto' - any mode "compress_native_block" in ClickHouse::Encoder accepts) the block is first wrapped in ClickHouse's compressed-block framing (16-byte CityHash128 + 9-byte header + compressed payload) before being placed inside the Data packet. The server must already be expecting compressed data via the corresponding pack_query(... compression => COMPRESSION_ENABLE); sending compressed Data without negotiating compression in the Query packet will be rejected as a parse error by CompressedReadBuffer. compress absent, or 'none' / 'raw', emits the bare uncompressed block (the default).

pack_data_end %opts

my $bytes = ClickHouse::Encoder::TCP->pack_data_end();
my $bytes = ClickHouse::Encoder::TCP->pack_data_end(
    compress => 'lz4');

Sends an empty Data packet; this is how an insert pipeline tells the server "no more data". When compression was negotiated in the Query, the empty block must be sent through the same compressed framing so CompressedReadBuffer parses it the same way - compress takes the same values as "pack_data".

pack_ping, pack_cancel

my $b = ClickHouse::Encoder::TCP->pack_ping;

Trivial control packets.

PACKET DECODER

unpack_packet $bytes, $offset

my ($pkt, $new_offset) = ClickHouse::Encoder::TCP->unpack_packet(
    $buffer, $offset);

Parse one server packet from $bytes starting at $offset. Returns a hashref with at least type (numeric, one of the SERVER_* constants) and packet-specific fields. Croaks on truncated input - catch with eval on a sliding-buffer reader.

For Data/Totals/Extremes/ProfileEvents packets the hashref carries block_offset: the byte offset where the inner Native block begins. Pass substr($bytes, $block_offset) to ClickHouse::Encoder->decode_block to extract rows.

read_packet $fh

my $rbuf = '';
my $pkt = ClickHouse::Encoder::TCP->read_packet($fh, buffer => \$rbuf);
my $pkt = ClickHouse::Encoder::TCP->read_packet($fh, compressed => 1);

Blocking read from a filehandle. sysreads in chunks until it has one whole packet, parses it, and returns the hashref. For Data-shaped packets, also reads the inner block and surfaces block_bytes + pre-decoded block hashref. Convenience only; non-blocking transports should call "unpack_packet" on their own buffer.

A single sysread may pull in more than one packet. Pass buffer => \my $buf and thread the same scalar ref through every read_packet call on the filehandle: read_packet seeds itself from that buffer and leaves any over-read bytes there for the next call. This is required to read more than one packet - without a caller buffer, over-read bytes are dropped and a second read_packet call may block or misparse. The compression and buffer options are independent and may be combined.

When the caller has negotiated compression (via pack_query(... compression => COMPRESSION_ENABLE)), pass compressed => 1 so read_packet peels the inner compressed-block framing (16-byte CityHash128 + 9-byte header + LZ4/ZSTD payload) via "decompress_native_block" in ClickHouse::Encoder before decoding the Native block. block_bytes on the returned hashref is the decompressed inner block; compressed_consumed reports how many on-the-wire bytes the framed block occupied.

WIRE CODECS

The varint and length-prefixed-string codecs the packet builders use are XS-backed and callable directly. They are a semi-public surface: stable and reusable, but secondary to the packet-level API above. They are plain functions (not class methods) - invoke them fully qualified, e.g. ClickHouse::Encoder::TCP::pack_varint($n).

pack_varint $uint

Encode a non-negative integer in ClickHouse's LEB128 varuint form; returns the byte string.

unpack_varint $bytes, $offset

Decode one varint at $offset; returns ($value, $new_offset). Croaks on truncated input or a varint wider than 64 bits.

pack_string $str

Encode a length-prefixed string: a varint length followed by the UTF-8 bytes of $str. Returns the byte string.

unpack_string $bytes, $offset

Decode one length-prefixed string at $offset; returns ($string, $new_offset) with the raw (undecoded) string bytes.

CONSTANTS

Client packet types: CLIENT_HELLO, CLIENT_QUERY, CLIENT_DATA, CLIENT_CANCEL, CLIENT_PING.

Server packet types: SERVER_HELLO, SERVER_DATA, SERVER_EXCEPTION, SERVER_PROGRESS, SERVER_PONG, SERVER_END_OF_STREAM, SERVER_PROFILE_INFO, SERVER_TOTALS, SERVER_EXTREMES, SERVER_TABLE_COLUMNS, SERVER_PROFILE_EVENTS.

Other: STAGE_COMPLETE, STAGE_WITH_MERGEABLE, STAGE_FETCH_COLUMNS, COMPRESSION_DISABLE, COMPRESSION_ENABLE, DEFAULT_REVISION.

All constants live in the package namespace and are not exported; reference them as ClickHouse::Encoder::TCP::SERVER_DATA etc.

CAVEATS

  • Modern server cutoff. The default revision (54429) predates the chunking-negotiation extension introduced in ClickHouse 24.10 (protocol revision >= 54475). Newer servers send a chunking offer right after SERVER_HELLO that this subset does not respond to; the connection then fails with a fast protocol-mismatch error. For integration with recent servers, prefer HTTP transport.

  • String encoding. Inputs to pack_string (any string field: query, names, settings) are encoded as UTF-8 bytes; passing a byte-mode string with non-ASCII bytes will be reinterpreted as Latin-1 by Perl's UTF-8 upgrade rules. If you need raw bytes, encode to UTF-8 yourself first. unpack_string conversely returns the raw byte string the server sent - it does not set the UTF-8 flag, so callers wanting characters should decode_utf8 the result.

  • Settings values are strings. Each setting is emitted with flags=0 (ordinary) and the value as a string. Numeric/typed-setting encoding (flexible settings, available at higher revisions) is out of scope.

  • Wire compression is opt-in. pack_query still defaults to COMPRESSION_DISABLE; to negotiate compression pass compression => COMPRESSION_ENABLE in pack_query, then compress => 'lz4' (or 'zstd') to both pack_data and pack_data_end. Compressed Data packets coming back from the server are decoded by read_packet($fh, compressed => 1). The compressed-block framing (16-byte CityHash128 v1.0.2 + 9-byte header + payload) lives in "compress_native_block" in ClickHouse::Encoder.

SEE ALSO

ClickHouse::Encoder - the wire-format encoder these packets carry, plus compress_native_block / decompress_native_block for the matching block-framing helpers.

EV::ClickHouse - full async ClickHouse client (TCP + HTTP) for select result streaming, prepared queries with parameter binding, and chunking-negotiation against modern CH revisions.

AUTHOR

vividsnow

LICENSE

Same terms as Perl itself.