NAME
ClickHouse::Encoder::TCP - Pack/unpack a useful subset of the ClickHouse native TCP protocol
SYNOPSIS
use IO::Socket::INET;
use ClickHouse::Encoder;
use ClickHouse::Encoder::TCP;
my $s = IO::Socket::INET->new(PeerAddr => 'db:9000') or die;
binmode $s;
# A caller-owned buffer threaded through every read_packet call -
# required when reading more than one packet, since a single
# sysread can pull in several packets at once.
my $rbuf = '';
# 1. Handshake
print $s ClickHouse::Encoder::TCP->pack_hello(
user => 'default', password => '', database => 'default');
my $hello = ClickHouse::Encoder::TCP->read_packet($s, buffer => \$rbuf);
die "expected Hello, got type $hello->{type}\n"
unless $hello->{type} == ClickHouse::Encoder::TCP::SERVER_HELLO;
# 2. insert query
print $s ClickHouse::Encoder::TCP->pack_query(
query => 'insert into events format native');
# 3. Read TableColumns / empty Data packets the server sends back
while (1) {
my $p = ClickHouse::Encoder::TCP->read_packet($s, buffer => \$rbuf);
last if $p->{type} == ClickHouse::Encoder::TCP::SERVER_DATA;
}
# 4. Send our data block(s)
my $enc = ClickHouse::Encoder->new(columns => [
['ev', 'String'], ['ts', 'DateTime']]);
my $block = $enc->encode([ ['login', time()] ]);
print $s ClickHouse::Encoder::TCP->pack_data($block);
# 5. End of insert
print $s ClickHouse::Encoder::TCP->pack_data_end();
# 6. Wait for EndOfStream / Exception
while (1) {
my $p = ClickHouse::Encoder::TCP->read_packet($s, buffer => \$rbuf);
last if $p->{type} == ClickHouse::Encoder::TCP::SERVER_END_OF_STREAM;
die "server exception: $p->{message}\n"
if $p->{type} == ClickHouse::Encoder::TCP::SERVER_EXCEPTION;
}
close $s;
DESCRIPTION
A pure-Perl helper module that packs the few client packets needed to drive an insert pipeline over the ClickHouse native TCP protocol (port 9000), plus a decoder for the most common server packets: Hello, Data, Exception, Progress, Pong, EndOfStream, ProfileInfo, TableColumns, ProfileEvents.
Targets protocol revision 54429 - the same revision ClickHouse clients have used since ~2020. Newer fields the server may include (timezone, display_name, version_patch) are read opportunistically when present.
Transport (socket / TLS / framing) is the caller's responsibility. "read_packet" is provided as a convenience for blocking IO::Socket-style use; for non-blocking transports, call "unpack_packet" on a sliding byte buffer directly.
Out of scope:
Settings with typed values (newer flexible-setting wire form).
select result streaming (covers what's needed for inserts).
Server's prepared-query parameters protocol.
Wire compression is supported as an opt-in: "pack_data" / "pack_data_end" accept compress => 'lz4' or 'zstd', and "read_packet" accepts compressed => 1. See CAVEATS for the negotiation handshake the caller must perform first.
For select, prefer HTTP - it's simpler and well-supported by ClickHouse::Encoder's decode_block / decode_stream.
PACKET ENCODERS
pack_hello %opts
my $bytes = ClickHouse::Encoder::TCP->pack_hello(
client_name => 'my-app', # default 'ClickHouse::Encoder'
major => 1, # default 1
minor => 0, # default 0
revision => 54429, # default DEFAULT_REVISION
database => 'default',
user => 'default',
password => '',
);
pack_query %opts
my $bytes = ClickHouse::Encoder::TCP->pack_query(
query => 'insert into t format native',
query_id => '', # let server generate
settings => { max_memory_usage => '1000000000' }, # optional
stage => STAGE_COMPLETE, # default
compression => COMPRESSION_DISABLE, # default
);
settings may be a hashref (legacy string-value form) or a raw byte string already in the right shape.
pack_data $block_bytes, %opts
my $bytes = ClickHouse::Encoder::TCP->pack_data($block);
my $bytes = ClickHouse::Encoder::TCP->pack_data($block,
compress => 'lz4'); # or 'zstd'
Wraps an encoded Native block (from "encode" in ClickHouse::Encoder) in a Data packet with an optional table_name (usually empty for inserts).
With compress => 'lz4' (or 'zstd', or 'auto' - any mode "compress_native_block" in ClickHouse::Encoder accepts) the block is first wrapped in ClickHouse's compressed-block framing (16-byte CityHash128 + 9-byte header + compressed payload) before being placed inside the Data packet. The server must already be expecting compressed data via the corresponding pack_query(... compression => COMPRESSION_ENABLE); sending compressed Data without negotiating compression in the Query packet will be rejected as a parse error by CompressedReadBuffer. compress absent, or 'none' / 'raw', emits the bare uncompressed block (the default).
pack_data_end %opts
my $bytes = ClickHouse::Encoder::TCP->pack_data_end();
my $bytes = ClickHouse::Encoder::TCP->pack_data_end(
compress => 'lz4');
Sends an empty Data packet; this is how an insert pipeline tells the server "no more data". When compression was negotiated in the Query, the empty block must be sent through the same compressed framing so CompressedReadBuffer parses it the same way - compress takes the same values as "pack_data".
pack_ping, pack_cancel
my $b = ClickHouse::Encoder::TCP->pack_ping;
Trivial control packets.
PACKET DECODER
unpack_packet $bytes, $offset
my ($pkt, $new_offset) = ClickHouse::Encoder::TCP->unpack_packet(
$buffer, $offset);
Parse one server packet from $bytes starting at $offset. Returns a hashref with at least type (numeric, one of the SERVER_* constants) and packet-specific fields. Croaks on truncated input - catch with eval on a sliding-buffer reader.
For Data/Totals/Extremes/ProfileEvents packets the hashref carries block_offset: the byte offset where the inner Native block begins. Pass substr($bytes, $block_offset) to ClickHouse::Encoder->decode_block to extract rows.
read_packet $fh
my $rbuf = '';
my $pkt = ClickHouse::Encoder::TCP->read_packet($fh, buffer => \$rbuf);
my $pkt = ClickHouse::Encoder::TCP->read_packet($fh, compressed => 1);
Blocking read from a filehandle. sysreads in chunks until it has one whole packet, parses it, and returns the hashref. For Data-shaped packets, also reads the inner block and surfaces block_bytes + pre-decoded block hashref. Convenience only; non-blocking transports should call "unpack_packet" on their own buffer.
A single sysread may pull in more than one packet. Pass buffer => \my $buf and thread the same scalar ref through every read_packet call on the filehandle: read_packet seeds itself from that buffer and leaves any over-read bytes there for the next call. This is required to read more than one packet - without a caller buffer, over-read bytes are dropped and a second read_packet call may block or misparse. The compression and buffer options are independent and may be combined.
When the caller has negotiated compression (via pack_query(... compression => COMPRESSION_ENABLE)), pass compressed => 1 so read_packet peels the inner compressed-block framing (16-byte CityHash128 + 9-byte header + LZ4/ZSTD payload) via "decompress_native_block" in ClickHouse::Encoder before decoding the Native block. block_bytes on the returned hashref is the decompressed inner block; compressed_consumed reports how many on-the-wire bytes the framed block occupied.
WIRE CODECS
The varint and length-prefixed-string codecs the packet builders use are XS-backed and callable directly. They are a semi-public surface: stable and reusable, but secondary to the packet-level API above. They are plain functions (not class methods) - invoke them fully qualified, e.g. ClickHouse::Encoder::TCP::pack_varint($n).
- pack_varint $uint
-
Encode a non-negative integer in ClickHouse's LEB128 varuint form; returns the byte string.
- unpack_varint $bytes, $offset
-
Decode one varint at
$offset; returns($value, $new_offset). Croaks on truncated input or a varint wider than 64 bits. - pack_string $str
-
Encode a length-prefixed string: a varint length followed by the UTF-8 bytes of
$str. Returns the byte string. - unpack_string $bytes, $offset
-
Decode one length-prefixed string at
$offset; returns($string, $new_offset)with the raw (undecoded) string bytes.
CONSTANTS
Client packet types: CLIENT_HELLO, CLIENT_QUERY, CLIENT_DATA, CLIENT_CANCEL, CLIENT_PING.
Server packet types: SERVER_HELLO, SERVER_DATA, SERVER_EXCEPTION, SERVER_PROGRESS, SERVER_PONG, SERVER_END_OF_STREAM, SERVER_PROFILE_INFO, SERVER_TOTALS, SERVER_EXTREMES, SERVER_TABLE_COLUMNS, SERVER_PROFILE_EVENTS.
Other: STAGE_COMPLETE, STAGE_WITH_MERGEABLE, STAGE_FETCH_COLUMNS, COMPRESSION_DISABLE, COMPRESSION_ENABLE, DEFAULT_REVISION.
All constants live in the package namespace and are not exported; reference them as ClickHouse::Encoder::TCP::SERVER_DATA etc.
CAVEATS
Modern server cutoff. The default revision (54429) predates the chunking-negotiation extension introduced in ClickHouse 24.10 (protocol revision >= 54475). Newer servers send a chunking offer right after
SERVER_HELLOthat this subset does not respond to; the connection then fails with a fast protocol-mismatch error. For integration with recent servers, prefer HTTP transport.String encoding. Inputs to
pack_string(any string field: query, names, settings) are encoded as UTF-8 bytes; passing a byte-mode string with non-ASCII bytes will be reinterpreted as Latin-1 by Perl's UTF-8 upgrade rules. If you need raw bytes, encode to UTF-8 yourself first.unpack_stringconversely returns the raw byte string the server sent - it does not set the UTF-8 flag, so callers wanting characters shoulddecode_utf8the result.Settings values are strings. Each setting is emitted with flags=0 (ordinary) and the value as a string. Numeric/typed-setting encoding (flexible settings, available at higher revisions) is out of scope.
Wire compression is opt-in.
pack_querystill defaults toCOMPRESSION_DISABLE; to negotiate compression passcompression => COMPRESSION_ENABLEinpack_query, thencompress => 'lz4'(or'zstd') to bothpack_dataandpack_data_end. Compressed Data packets coming back from the server are decoded byread_packet($fh, compressed => 1). The compressed-block framing (16-byte CityHash128 v1.0.2 + 9-byte header + payload) lives in "compress_native_block" in ClickHouse::Encoder.
SEE ALSO
ClickHouse::Encoder - the wire-format encoder these packets carry, plus compress_native_block / decompress_native_block for the matching block-framing helpers.
EV::ClickHouse - full async ClickHouse client (TCP + HTTP) for select result streaming, prepared queries with parameter binding, and chunking-negotiation against modern CH revisions.
AUTHOR
vividsnow
LICENSE
Same terms as Perl itself.