The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

DataStore::CAS::Simple - Simple file/directory based CAS implementation

VERSION

version 0.08

DESCRIPTION

This implementation of DataStore::CAS uses a directory tree where the filenames are the hexadecimal value of the digest hashes. The files are placed into directories named with a prefix of the digest hash to prevent too many entries in the same directory (which is actually only a concern on certain filesystems).

Opening a File returns a real perl filehandle, and copying a File object from one instance to another is optimized by hard-linking the underlying file.

  # This is particularly fast:
  $cas1= DataStore::CAS::Simple->new( path => 'foo' );
  $cas2= DataStore::CAS::Simple->new( path => 'bar' );
  $cas1->put( $cas2->get( $hash ) );

This class does not perform any sort of optimization on the storage of the content, neither by combining commom sections of files nor by running common compression algorithms on the data.

TODO: write DataStore::CAS::Compressor or DataStore::CAS::Splitter for those features.

ATTRIBUTES

path

Read-only. The filesystem path where the store is rooted.

digest

Read-only. Algorithm used to calculate the hash values. This can only be set in the constructor when a new store is being created. Default is SHA-1.

fanout

Read-only. Returns arrayref of pattern used to split digest hashes into directories. Each digit represents a number of characters from the front of the hash which then become a directory name. The final digit may be the character '=' to indicate the filename is the full hash, or '*' to indicate the filename is the remaining digits of the hash. '*' is the default behavior if the fanout does not include one of these characters.

For example, [ 2, 2 ] would turn a hash of "1234567890" into a path of "12/34/567890". [ 2, 2, '=' ] would turn a hash of "1234567890" into a path of "12/34/1234567890".

fanout_list

Convenience accessor for @{ $cas->fanout }

copy_buffer_size

Number of bytes to copy at a time when saving data from a filehandle to the CAS. This is a performance hint, and the default is usually fine.

storage_format_version

Hashref of version information about the modules that created the store. Newer library versions can determine whether the storage is using an old format using this information.

METHODS

new

  $class->new( \%params | %params )

Constructor. It will load (and possibly create) a CAS Store.

If create is specified, and path refers to an empty directory, a fresh store will be initialized. If create is specified and the directory is already a valid CAS, create is ignored, as well as digest and fanout.

path points to the cas directory. Trailing slashes don't matter. You might want to use an absolute path in case you chdir later.

copy_buffer_size initializes the respective attribute.

The digest and fanout attributes can only be initialized if the store is being created. Otherwise, it is loaded from the store's configuration.

ignore_version allows you to load a Store even if it was created with a newer version of the DataStore::CAS::Simple package that you are now using. (or a different package entirely)

path_parts_for_hash

  my (@path)= $cas->path_parts_for_hash($digest_hash);

Given a hash string, return the directory parts and filename where that content would be found. They are returned as a list. If the hash is not valid for this digest algorithm, this will throw an exception.

path_for_hash

  my $path= $cas->path_for_hash($digest_hash);
  my $path= $cas->path_for_hash($digest_hash, $create_dirs);

Given a hash string, return the path to the file, including $self->path. The second argument can be set to true to create any missing directories in this path.

create_store

  $class->create_store( %configuration | \%configuration )

Create a new store at a specified path. Configuration must include path, and may include digest and fanout. path must be an empty writeable directory, and it must exist. digest currently defaults to SHA-1. fanout currently defaults to [1, 2], resulting in paths like "a/bc/defg".

This method can be called on classes or instances.

You may also specify create => 1 in the constructor to implicitly call this method using the relevant parameters you supplied to the constructor.

get

See "get" in DataStore::CAS for details.

put

See "put" in DataStore::CAS for details.

put_scalar

See "put_scalar" in DataStore::CAS for details.

put_file

See "put_file" in DataStore::CAS for details. In particular, heed the warnings about using the 'hardlink' and 'reuse_hash' flag.

DataStore::CAS::Simple has special support for the flags 'move' and 'hardlink'. If your source is a real file on the same filesystem by the same owner and/or group, { move => 1 } will move the file instead of copying it. (If it is a different filesystem or ownership can't be changed, it gets copied and the original gets unlinked). If the file is a real file on the same filesystem with correct owner and permissions, { hardlink => 1 } will link the file into the CAS instead of copying it.

new_write_handle

See "new_write_handle" in DataStore::CAS for details.

commit_write_handle

See "commit_write_handle" in DataStore::CAS for details.

validate

See "validate" in DataStore::CAS for details.

open_file

See "open_file" in DataStore::CAS for details.

iterator

See "iterator" in DataStore::CAS for details.

delete

See "delete" in DataStore::CAS for details.

FILE OBJECTS

File objects returned by DataStore::CAS::Simple have two additional attributes:

local_file

The filename of the disk file within DataStore::CAS::Simple's path which holds the requested data.

block_size

The block_size parameter from stat(), which might be useful for accessing the file efficiently.

AUTHOR

Michael Conrad <mconrad@intellitree.com>

COPYRIGHT AND LICENSE

This software is copyright (c) 2023 by Michael Conrad, and IntelliTree Solutions llc.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.