The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

DBM::Deep::Cookbook

DESCRIPTION

This is the Cookbook for DBM::Deep. It contains useful tips and tricks, plus some examples of how to do common tasks.

RECIPES

UTF8 data

When you're using UTF8 data, you may run into the "Wide character in print" warning. To fix that in 5.8+, do the following:

  my $db = DBM::Deep->new( ... );
  binmode $db->_fh, ":utf8";

In 5.6, you will have to do the following:

  my $db = DBM::Deep->new( ... );
  $db->set_filter( 'store_value' => sub { pack "U0C*", unpack "C*", $_[0] } );
  $db->set_filter( 'retrieve_value' => sub { pack "C*", unpack "U0C*", $_[0] } );

In a future version, you will be able to specify utf8 => 1 and DBM::Deep will do these things for you.

Real-time Encryption Example

NOTE: This is just an example of how to write a filter. This most definitely should NOT be taken as a proper way to write a filter that does encryption.

Here is a working example that uses the Crypt::Blowfish module to do real-time encryption / decryption of keys & values with DBM::Deep Filters. Please visit http://search.cpan.org/search?module=Crypt::Blowfish for more on Crypt::Blowfish. You'll also need the Crypt::CBC module.

  use DBM::Deep;
  use Crypt::Blowfish;
  use Crypt::CBC;

  my $cipher = Crypt::CBC->new({
      'key'             => 'my secret key',
      'cipher'          => 'Blowfish',
      'iv'              => '$KJh#(}q',
      'regenerate_key'  => 0,
      'padding'         => 'space',
      'prepend_iv'      => 0
  });

  my $db = DBM::Deep->new(
      file => "foo-encrypt.db",
      filter_store_key => \&my_encrypt,
      filter_store_value => \&my_encrypt,
      filter_fetch_key => \&my_decrypt,
      filter_fetch_value => \&my_decrypt,
  );

  $db->{key1} = "value1";
  $db->{key2} = "value2";
  print "key1: " . $db->{key1} . "\n";
  print "key2: " . $db->{key2} . "\n";

  undef $db;
  exit;

  sub my_encrypt {
      return $cipher->encrypt( $_[0] );
  }
  sub my_decrypt {
      return $cipher->decrypt( $_[0] );
  }

Real-time Compression Example

Here is a working example that uses the Compress::Zlib module to do real-time compression / decompression of keys & values with DBM::Deep Filters. Please visit http://search.cpan.org/search?module=Compress::Zlib for more on Compress::Zlib.

  use DBM::Deep;
  use Compress::Zlib;

  my $db = DBM::Deep->new(
      file => "foo-compress.db",
      filter_store_key => \&my_compress,
      filter_store_value => \&my_compress,
      filter_fetch_key => \&my_decompress,
      filter_fetch_value => \&my_decompress,
  );

  $db->{key1} = "value1";
  $db->{key2} = "value2";
  print "key1: " . $db->{key1} . "\n";
  print "key2: " . $db->{key2} . "\n";

  undef $db;
  exit;

  sub my_compress {
      return Compress::Zlib::memGzip( $_[0] ) ;
  }
  sub my_decompress {
      return Compress::Zlib::memGunzip( $_[0] ) ;
  }

Note: Filtering of keys only applies to hashes. Array "keys" are actually numerical index numbers, and are not filtered.

Custom Digest Algorithm

DBM::Deep by default uses the Message Digest 5 (MD5) algorithm for hashing keys. However you can override this, and use another algorithm (such as SHA-256) or even write your own. But please note that DBM::Deep currently expects zero collisions, so your algorithm has to be perfect, so to speak. Collision detection may be introduced in a later version.

You can specify a custom digest algorithm by passing it into the parameter list for new(), passing a reference to a subroutine as the 'digest' parameter, and the length of the algorithm's hashes (in bytes) as the 'hash_size' parameter. Here is a working example that uses a 256-bit hash from the Digest::SHA256 module. Please see http://search.cpan.org/search?module=Digest::SHA256 for more information.

  use DBM::Deep;
  use Digest::SHA256;

  my $context = Digest::SHA256::new(256);

  my $db = DBM::Deep->new(
      filename => "foo-sha.db",
      digest => \&my_digest,
      hash_size => 32,
  );

  $db->{key1} = "value1";
  $db->{key2} = "value2";
  print "key1: " . $db->{key1} . "\n";
  print "key2: " . $db->{key2} . "\n";

  undef $db;
  exit;

  sub my_digest {
      return substr( $context->hash($_[0]), 0, 32 );
  }

Note: Your returned digest strings must be EXACTLY the number of bytes you specify in the hash_size parameter (in this case 32). Undefined behavior will occur otherwise.

Note: If you do choose to use a custom digest algorithm, you must set it every time you access this file. Otherwise, the default (MD5) will be used.

PERFORMANCE

Because DBM::Deep is a conncurrent datastore, every change is flushed to disk immediately and every read goes to disk. This means that DBM::Deep functions at the speed of disk (generally 10-20ms) vs. the speed of RAM (generally 50-70ns), or at least 150-200x slower than the comparable in-memory datastructure in Perl.

There are several techniques you can use to speed up how DBM::Deep functions.

  • Put it on a ramdisk

    The easiest and quickest mechanism to making DBM::Deep run faster is to create a ramdisk and locate the DBM::Deep file there. Doing this as an option may become a feature of DBM::Deep, assuming there is a good ramdisk wrapper on CPAN.

  • Work at the tightest level possible

    It is much faster to assign the level of your db that you are working with to an intermediate variable than to re-look it up every time. Thus

      # BAD
      while ( my ($k, $v) = each %{$db->{foo}{bar}{baz}} ) {
        ...
      }
    
      # GOOD
      my $x = $db->{foo}{bar}{baz};
      while ( my ($k, $v) = each %$x ) {
        ...
      }
  • Make your file as tight as possible

    If you know that you are not going to use more than 65K in your database, consider using the pack_size => 'small' option. This will instruct DBM::Deep to use 16bit addresses, meaning that the seek times will be less.