The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

DBIx::FileStore - Module to store files in a DBI backend

VERSION

Version 0.12

SYNOPSIS

Ever wanted to store files in a database? Yeah, it's probably a bad idea, but maybe you want to do it anyway.

This code helps you do that.

All the fdb tools in script/ use this library to get at file names and contents in the database.

To get started, see the README file (which includes a QUICKSTART guide) from the DBIx-FileStore distribution.

This document details the DBIx::FileStore module implementation.

FILENAME NOTES

The name of the file in the filestore cannot contain spaces.

The maximum length of the name of a file in the filestore is 75 characters.

You can store files under any name you wish in the filestore. The name need not correspond to the original name on the filesystem.

All filenames in the filestore are in one flat address space. You can use / in filenames, but it does not represent an actual directory. (Although fdbls has some support for viewing files in the filestore as if they were in folders. See the docs on 'fdbls' for details.)

IMPLEMENTATION CAVEATS

Note that DBIx::FileStore is a proof-of-concept demo. It was not designed as production code.

That having been said, it works quite nicely.

There are some design choices that we've debated, though.

In particular, we might reconsider having one row in the 'files' table for each block stored in the 'fileblocks' table. Perhaps instead, we'd have one entry in the 'files' table per file.

In concrete terms, though, the storage overhead of doing it this way is about 100 bytes per block-- and each block can be up to 512K. Assuming an average block size of 256K, the total storage overhead is still only about 0.03%.

IMPLEMENTATION

The data is stored in the database using two tables: 'files' and 'fileblocks'. All meta-data is stored in the 'files' table, and the file contents are stored in the 'fileblocks' table.

fileblocks table

The fileblocks table has only three fields:

name

The name of the block. Always looks like "filename.txt <BLOCKNUMBER>", for example "filestorename.txt 00000".

block

The contents of the named block. Each block is currently set to be 512K. Care must be taken to use blocks that are not larger than mysql buffers can handle (in particular, max_allowed_packet).

lasttime

The timestamp of when this block was inserted into the DB or updated.

files table

The files table has several fields. There is one row in the files table for each row in the fileblocks table-- not one per file (see IMPLEMENTATION CAVEATS, above). The fields in the files table are:

name

The name of the block, exactly as used in the fileblocks table. Always looks like "filename.txt <BLOCKNUMBER>", for example "filestorename.txt 00000".

c_len

Content length. The content length of the complete file (sum of length of all the file's blocks).

b_num

Block number. The number of the block this row represents. The b_num is repeated as a five (or more) digit number at the end of the name field (see above). We denormalize the data like this so we can quickly find blocks by name or block number.

b_md5

Block md5. The md5 checksum for the block (b is for 'block') represented by this row. We use base64 encoding (which uses 0-9, a-z, A-Z, and a few other characters) to represent md5's because it's a little shorter than the hex representation. (22 vs. 32 characters)

c_md5

Content md5. The base64 md5 checksum for the whole file (c is for 'content') represented by this row.

lasttime

The timestamp of when this row was inserted into the DB or updated.

See the file 'table-definitions.sql' for more details about the db schema used.

METHODS

my $filestore = new DBIx::FileStore()

returns a new DBIx::FileStore object

my $fileinfo_ref = $filestore->get_all_filenames()

Returns a list of references to data about all the files in the filestore.

Each row consist of the following columns: name, c_len, c_md5, lasttime_as_int

my $fileinfo_ref = get_filenames_matching_prefix( $prefix );

Returns a list of references to data about the files in the filestore whose name matches the prefix $prefix.

Returns a list of references in the same format as get_all_filenames().

my $bytecount = $filestore->read_from_db( "filesystemname.txt", "filestorename.txt" );

Copies the file 'filestorename.txt' from the filestore to the file filesystemname.txt on the local filesystem.

my $bytecount = $filestore->print_blocks_from_db_to_filehandle( $fh, $fdbname );

Prints the file 'filestorename.txt' from the filestore to the the filehandle.

my $bytecount = $filestore->_read_blocks_from_db( $callback_function, $fdbname );

** Intended for internal use by this module. **

Fetches the blocks from the database for the file stored under $fdbname, and calls the $callback_function on the data from each one after it is read.

Locks the relevant tables while data is extracted. Locking should probably be configurable by the caller, or at least finer-grained.

It also confirms that the base64 md5 checksum for each block and the file contents as a whole are correct. Die()'s with an error if a checksum doesn't match.

my $bytecount = $self->write_to_db( $localpathname, $filestorename );

Copies the file $localpathname from the filesystem to the name $filestorename in the filestore.

Locks the relevant tables while data is extracted. Locking should probably be configurable by the caller.

Returns the number of bytes written. Dies with a message if the source file could not be read.

Note that it currently reads the file twice: once to compute the md5 checksum before insterting it, and a second time to insert the blocks.

my $ok = $self->rename_file( $from, $to );

Renames the file in the database from $from to $to. Returns 1;

my $ok = $self->delete_file( $filename );

Removes data named $filename from the filestore.

FUNCTIONS

my $filename_ok = DBIx::FileStore::name_ok( $fdbname )

Checks that the name $fdbname is acceptable for using as a name in the filestore. Must not contain spaces or be over 75 chars.

AUTHOR

Josh Rabinowitz, <Josh Rabinowitz>

SUPPORT

You should probably read the documentation for the various filestore command-line tools:

  fdbcat, fdbget, fdbls, fdbmv, fdbput, fdbrm, fdbstat, and fdbtidy.
  fdbslurp (which is the opposite of fdbcat) was not completed.

You can also read the documentation at:

LICENSE AND COPYRIGHT

Copyright 2010 Josh Rabinowitz.

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.