DBD::Cassandra - DBI database backend for Cassandra
version 0.57
DBD::Cassandra is a Perl5 Database Interface driver for Cassandra, using the CQL3 query language.
use DBI; my $dbh = DBI->connect("dbi:Cassandra:host=localhost;keyspace=test", $user, $password, { RaiseError => 1 }); my $rows = $dbh->selectall_arrayref("SELECT id, field_one, field_two FROM some_table"); for my $row (@$rows) { # Do something with your row } $dbh->do("INSERT INTO some_table (id, field_one, field_two) VALUES (?, ?, ?)", { Consistency => "quorum" }, 1, "String value", 38962986 ); $dbh->disconnect;
use DBI; $dsn = "dbi:Cassandra:database=$database"; $dsn = "dbi:Cassandra:keyspace=$keyspace;host=$hostname;port=$port"; $dsn = "dbi:Cassandra:keyspace=$keyspace;consistency=local_quorum"; my $dbh = DBI->connect($dsn, $username, $password);
Optionally, a keyspace to use by default. If this is not specified, all queries must include the keyspace name.
Hostname to initially connect to. Defaults to localhost. Can be comma-separated to specify multiple hosts.
localhost
Port number to connect to. Defaults to 9042
9042
The compression method we should use for the connection. Currently Cassandra allows lz4 and snappy. Defaults to the algorithm with the best compression ratio, if the server supports it. Compression can be disabled by setting compression=none.
lz4
snappy
compression=none
Only used for data frames longer than 512 bytes, smaller frames get sent uncompressed.
There are several versions of the CQL language and this option lets you pick one. Defaults to the highest available version. Consult your Cassandra manual to see which versions your database supports.
See the chapter on consistency levels
Maximum amount of time (in seconds) to wait for a Cassandra network operation to finish.
Deprecated. These two are summed and used as request_timeout.
request_timeout
Boolean (1|0); whether to use TLS. Defaults to off.
my $sth= $dbh->prepare('SELECT "id", "field1", "field2" FROM table_name WHERE id=?', { Consistency => 'one' });
See "asynchronous queries".
See "consistency levels".
Cassandra supports pagination through result sets, to avoid having the entire result set in memory.
my $sth = $dbh->prepare('SELECT id FROM tablename', { PerPage => 1000 }); $sth->execute; while (my $row = $sth->fetchrow_arrayref()) { print "$row->[0]\n"; }
It is important to keep in mind that this mode can cause errors while fetching rows, as extra queries may be executed by the driver internally.
Cassandra supports collection types natively, eg. list and map. DBD::Cassandra translates them to native Perl types, eg. hashes and arrays.
list
map
When doing queries, placeholders can be substituted by these collections. For example, inserting a map into a table is done by passing a Perl hash.
my $sth= $dbh->prepare('INSERT INTO some_table (id, value) VALUES (?,?);'); $sth->execute(5, { days => 15 });
This will also work for IN queries, which accept an array.
IN
my $sth= $dbh->prepare('SELECT id, value FROM some_table WHERE id IN ?'); $sth->execute([1, 2, 3]); my $rows= $sth->fetchall_arrayref();
my $sth= $dbh->prepare("SELECT id FROM some_table WHERE x=?", { async => 1 }); $sth->execute(5); some_other_function(); while (my $row = $sth->fetchrow_arrayref()) { print "$row->[0]\n"; }
DBD::Cassandra supports asynchronous queries in an easy to use form. When async = 1> is passed to prepare(), any subsequent executes on the handle are not read back immediately. Instead, these are delayed until the result is actually needed.
async =
prepare()
For inserts and other writes, a convenience method x_finish_async is provided, which returns an approximation to what execute() would have returned in an non-asynchronous context. This method also raises errors, if needed.
x_finish_async
execute()
my $sth= $dbh->prepare("INSERT INTO table (a, b) VALUES (?, ?)", { async => 1 }); $sth->execute(5, 6); some_other_function_that_takes_a_while(); $sth->x_finish_async;
When using asynchronous queries, some previously premature optimizations become relevant. For example, it is very helpful to re-use statement handles in large volumes of inserts :
my @dataset_to_insert= ([1, 2, 3, 4], [5, 6, 7, 8]); my (@pending, @reusable); while (my $row= shift @dataset_to_insert) { my $sth= (shift @reusable) || $dbh->prepare( "INSERT INTO some_table (a, b, c, d) VALUES (?, ?, ?, ?)" ); $sth->execute(@$row); push @pending, $sth; if (@pending > 500) { # Tune this number! my $pending_sth= shift @pending; $pending_sth->x_finish_async; push @reusable, $pending_sth; } } $_->x_finish_async for @pending;
$dbh->do("INSERT INTO some_table (id, field_name) VALUES (?, ?)", { Consistency => "quorum" }, @values );
DBD::Cassandra accepts a Consistency attribute for statements. Supported consistency levels are any, one, two, three, quorum, all, local_quorum, each_quorum, serial, local_serial and local_one.
any
one
two
three
quorum
all
local_quorum
each_quorum
serial
local_serial
local_one
This attribute is ignored on statements that do not support it, such as CREATE.
CREATE
A global consistency level can be defined as part of the DSN.
There is currently no support for transactions. begin_work will die if you try to use it.
begin_work
Thread support is untested. Use at your own risk.
The timestamp format is implemented naively by returning milliseconds since the UNIX epoch. In Perl you get this number through time() * 1000. Trying to save times as DateTime objects or strings will not work, and will likely result in warnings and unexpected behavior.
timestamp
time() * 1000
DateTime
When using asynchronous queries, more functions than just execute() may throw errors. It is recommended that you enable RaiseError. If this is not possible, it should also suffice to call $sth-x_finish_async> and check its return value before reading any data from the handle.
$sth-
Cassandra/CQL3 is strict about the queries you write. When switching from other databases, such as MySQL, this may come as a surprise. This module supports quote(..), but try to use prepared statements instead. They will save you a lot of trouble.
quote(..)
As of DBD::Cassandra 0.51, this module uses Cassandra::Client internally. The unit tests from the previous release all still pass, but there are subtle changes :
DBD::Cassandra
Cassandra::Client
Instead of only connecting to the one specified host, multiple hosts can be passed as seed-hosts. These are then used to bootstrap the actual internal pool of connections.
Prior to version 0.25 there was a bug corrupting float and double values as they were stored in the database. The endianness on these values was wrong, which only shows when reading stored data back in an application written using a different driver.
If you were writing float or double values using a DBD::Cassandra prior to 0.25, please be careful with this upgrade. A way to rewrite your values between the two formats is :
my $good_float = unpack('f>', pack('f', $bad_float)); my $good_double= unpack('d>', pack('d', $bad_double));
If you never used a DBD::Cassandra version prior to 0.25, or do not use floats or doubles, this bug does not affect you and upgrading to 0.25 is safe.
Tom van der Woerdt <tvdw@cpan.org>
This software is copyright (c) 2017 by Tom van der Woerdt.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
To install DBD::Cassandra, copy and paste the appropriate command in to your terminal.
cpanm
cpanm DBD::Cassandra
CPAN shell
perl -MCPAN -e shell install DBD::Cassandra
For more information on module installation, please visit the detailed CPAN module installation guide.