The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Store::Directories - Manage a key/value store of directories with controls for concurrent access and locking.

SYNOPSIS

    use Store::Directories;

    # Create a new store at given directory
    # (or adopt one that is already there)
    my $store = Store::Directories->init("path/to/store/")

    # (In this example, we create a new directory containing a text file
    # and then atomically increment the value written in the file)

    my $value = 1;

    # Get a directory with the key 'foo' in the store,
    # creating it if it doesn't exist yet
    my $lock;
    my $dir = $store->get_or_add('foo' {

        # as an option, we can provide a subroutine to use to
        # initialize the directory contents if we create it
        # (but if the directory already exists, this won't be called)
        init => sub {
            my $dir = shift;
            open(my $fh, '>', "$dir/hello.txt") or die "could not open file: $!";
            print $fh $value;
            close $fh;
        }
    });

    {
        # Get an exclusive lock on the directory before reading/writing to it.
        # This ensure no other process is reading or modifying the directory
        # contents while we're working.
        my $lock = $store->lock_ex('foo');

        open(my $fh, '<', "$dir/hello.txt") or die "could not open file: $!";
        $value = <$fh>;
        open($fh, '>', "$dir/hello.txt")    or die "could not re-open file: $!";
        print $fh $value + 1;
        close $fh;

        # The lock is released once $lock is out-of-scope
    }

DESCRIPTION

Store::Directories manages a key/value store of directories and allows processes to assert shared (read-only) or exclusive (writable) locks on those directories.

Directories in a Store::Directories Store are referenced by unique string "keys". Internally, the directories are named with hexadecimal UUIDS, so the keys you use to identify them can contain illegal or unusual characters for filenames. (web URLs are a common example).

Processes can perform operations on these directories in parallel by requesting "locks" on particualr directories. These locks can be either shared or exclusive (to borrow flock(2) terminology). Lock objects are obtained with the lock_sh or lock_ex methods of a Store::Directories instance and are automatically released once they go out of scope.

Shared locks are used when a process wants to read, but not modify the contents of a directory while being sure that no other process can modify the contents while its reading. There can be multiple shared locks from different processes on a directory at once, but never at the same time as an exclusive lock.

Exclusive locks are used when a process wants to read and modify the contents of a directory while being sure that no other process can modify or read the contents while its working. There can only be one exclusive lock on a directory at once and there can't be any shared locks with it.

If a process requests a lock that is unavailable at the moment (due to another process already having an incompatible lock), then the process will block until the lock can be obtained (either by the other process dying or releasing its locks). Be aware that the order in which locks are granted is not necessarily the same order that that they were requested in.

WARNING: The guarantees around locking make the assumption that every process is using this package and playing by its rules. Unrelated processes are free to ignore the rules and mess things up as much as they like.

PUBLIC METHODS

  • init DIRECTORY

    Create and return a new Store::Directories instance in the given directory. Bookkeeping files and directory entries will be stored inside this directory. If a Store::Directories instance already exists in that directory, then this will simply adopt the one that's there.

  • path

    Get the absolute path to this Store's directory.

  • get_or_add KEY, {OPTIONS}

    Get the path to the directory referred to by KEY, creating it if it doesn't yet exist. Returns the absolute path to the directory. OPTIONS is a hashref that can contain the following options:

    • init (subroutine ref)

      A subroutine used to initialize the directory in the event that it gets created (although if the directory already exists when get_or_add is called, this won't be called). This is called with the absolute path to the directory as the first arguemnt and the key name as the second argument. An exclusive lock is active on the directory for the duration of the function. If the function dies, then the entire call to get_or_add will croak and the directory will not be created. If this isn't specified, an empty directory is created. (default: undef)

    • lock_sh (scalar ref)

      Create a shared lock to the directory, storing it in the value referenced by this option. This works like calling the lock_sh method, but eliminates the possible race condition where another process can get a lock (or even remove) the directory between creating it and calling lock_sh. However, if the directory already exists, this may block until the lock can be obtained. (default: undef)

    • lock_ex (scalar ref)

      Just like the lock_sh option, but for an exclusive lock. If both options are specified, only the exclusive lock is created and the shared lock is ignored. (default: undef)

    Example:

        my $lock;
        my $dir = $store->get_or_add('foobar' {
            init    => sub {
                my $dir = shift;
                # Initialize directory
            },
            lock_sh => \$lock
        });

    NOTE: Keys matching the pattern /^__.*__$/ (that is, surrounded by double-underscores) are reserved by Store::Directories and cannot be used. Currently, the only key like this is __LISTING__, which is used internally to lock the list of directories (so that they can't be removed or added).

  • lock_sh KEY, [NOBLOCK]

    Create and return a new shared lock for the given key. This asserts that no other process can modify the corresponding entry until this lock goes out-of-scope.

    This blocks until the lock can be obtained. So it will wait for any processes that already have an exclusive lock on this key to release their locks before returning. But if NOBLOCK is true, then this will not block but may return undef if the lock couldn't be obtained.

    This will croak if this process already has a lock (either kind) on this key, or if the key does not exist in the store.

  • lock_ex KEY [NOBLOCK]

    Create and return a new exclusive lock for the given key. This asserts that no other process can read the corresponding entry until this lock goes out-of-scope.

    This blocks until the lock can be obtained. So it will wait for any processes that have locks on this key to release them before returning. But if NOBLOCK is true, then this will not block but may return undef if the lock couldn't be obtained.

    This will croak if this process already has a lock (either kind) on this key, or if the key does not exist in the store.

  • remove KEY [SUB]

    Remove the directory with the given key from the store. You MUST have an exclusive lock already on the directory before calling this. SUB is a subroutine ref which, if specified, will be called immediately before deleting the directory. SUB is called with the path to the directory as the first argument and the key for the directory as the second argument.

    If an error occurs removing the directory from disk, (from SUB failing, or otherwise), then the directory will still be removed from the store's index and a warning will be given as the directory still on disk may be in a degraded state.

  • get_locks KEY

    Returns a hashref listing all of the current locks for the directory with the given KEY. Each key in the hash is the PID of a process and each corresponding value is true/false indicating whether or not the lock is exclusive.

  • get_listing

    Returns a hashref listing all of the directories in the store. Each key in the hash is the key for that directory while the corresponding value is the absolute path to the directory.

  • get_in_dir KEY, SUB [INIT]

    Get a shared lock for the directory with key, KEY, then execute the subroutine reference, SUB (calling with the absolute path to the directory as the first argument and the key as the second argument). Returns whatever SUB returns. Essentially, this is just a convenient shortcut for something like this:

        my $dir  = $store->get_or_add('foo');
        my $lock = $store->lock_sh('foo');
        my $val = do_whatever($dir, 'foo');
    
        # shortcut
        my $val = $store->get_in_dir('foo', \&do_whatever);

    Naturally, your SUB subroutine shouldn't modify the contents of the directory or else you'll be violating the trust that Store::Directories (and other processes!) place in you.

    The optional INIT argument is a subroutine used to initialize the directory in the event it doesn't yet exist when this is called. (Same semantics as the init option to get_or_add).

  • run_in_dir KEY, SUB [INIT]

    Get an exclusive lock for the directory with key, KEY, then execute the subroutine reference, SUB (calling with the absolute path to the directory as the first argument and the key as the second argument). Returns whatever SUB returns. Essentially, this is just a convenient shortcut for something like this:

        my $dir  = $store->get_or_add('foo');
        my $lock = $store->lock_ex('foo');
        my $val = do_whatever($dir, 'foo');
    
        # shortcut
        my $val = $store->run_in_dir('foo', \&do_whatever);

    Unlike get_in_dir, your SUB subroutine is allowed to modify (or even delete!) the directory and its contents.

    The optional INIT argument is a subroutine used to initialize the directory in the event it doesn't yet exist when this is called. (Same semantics as the init option to get_or_add).

  • get_or_set KEY, GET, SET [INIT]

    A combination of get_in_dir and run_in_dir. GET and SET are subroutine references. For the directory with key, KEY, runs the GET subroutine under a shared lock and returns whatever it returns. But if GET returns undef, then it will call SET under an exclusive lock before trying GET again. (If it returns undef this time, then this method will just return undef).

    Both subroutines are called with the absolute path to the directory as the first argument, and the key as the second argument. If any of them die, then this entire function will croak.

    This is useful when you have multiple processes that may want to perform some operation in the same directory, but you want to make sure that operation is only performed once. GET can be made to return undef if it detects the operation has not been done yet, while SET performs the operation.

    Be aware that GET may actually get called up to three times. First, under the shared lock. And, if it returns undef, then it will be called again immediately after upgrading to an exclusive lock (in case another process got to the exclusive lock first and already called SET for us). If that's still undef, then it will be called a third and final time.

    The optional INIT argument is a subroutine used to initialize the directory in the event it doesn't yet exist when this is called. (Same semantics as the init option to get_or_add).

AUTHOR

Cameron Tauxe camerontauxe@gmail.com

LICENSE AND COPYRIGHT

This software is copyright (c) 2020 by Cameron Tauxe.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.