The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

File::Transaction::Atomic - atomic change to a group of files

SYNOPSIS

  #
  # In this example, we wish to replace the word 'foo' with the
  # word 'bar' in several files, with no risk of ending up with
  # the replacement done in some files but not in others.
  #

  use File::Transaction::Atomic;

  my $ft = File::Transaction::Atomic->new;

  eval {
      foreach my $file (@list_of_file_names) {
          $ft->linewise_rewrite($file, sub {
               s#\bfoo\b#bar#g;
          });
      }
  };

  if ($@) {
      $ft->revert;
      die "update aborted: $@";
  }
  else {
      $ft->commit;
  }

DESCRIPTION

This class is a child of File::Transaction that reimplements the commit() method to give a truly atomic commit operation, at the cost of greater overhead and dependence on UNIX file system semantics.

During the preparation for the commit, each file is atomically replaced with a symbolic link that points to the same file under another name. The commit is only really atomic if those operations are deemed not to constitute changes.

This module's commit() method performs roughly ten times as many file and directory operations as that of File::Transaction, and the code here is much more complex (and hence more likely to contain serious bugs) than that found in File::Transaction.

You should only be using this module if the risk of a partial commit described in File::Transaction is unacceptable in your application.

METHODS

commit ( [WORKDIR] [,OLDEXT] [,TMPLNKEXT] )

Performs an atomic commit.

The commit() method needs to create a temporary work directory and populate it with subdirectories and symlinks, and WORKDIR gives the name of the directory to use. If the WORKDIR directory exists, the commit() method assumes that it was left over from a previous invocation of the File::Transaction::Atomic commit() method that failed part way through, and tidies up accordingly. The default workdir is .ftawork in the current directory.

The OLDEXT parameter sets the string to append to the name of each file to generate the name of a temporary additional hardlink to each old version. Any existing file of that name will be removed. The default OLDEXT is the TMPEXT value that was passed to the object's constructor with .old appended to it.

The TMPLNKEXT parameter sets the string to append to the name of each file to generate a temporary name for the symlink that will replace the live file during the preparation for the commit. Any existing file of that name will be removed. The default TMPLNKEXT is the TMPEXT value that was passed to the object's constructor with .lnk appended to it.

SECURITY CONCERNS

If the commit() method finds that the WORKDIR already exists, then it trusts the contents of that WORKDIR to the extent that a malicious WORKDIR could cause commit() to attempt to replace any file on the system with any other file. Thus it is a very bad idea to do something like:

   $fta->commit('/tmp/ftawork');

since that would allow any local user to set up a fake /tmp/ftawork and subvert the commit().

The default WORKDIR of .ftawork is only safe if no untrusted person can control the creation of files in your current working directory.

HOW IT WORKS

The atomic commit works by first replacing each live file with a symlink to a symlink to the old version, and then atomically changing the target of another symlink with a rename() system call, so that all live files suddenly become symlinks to symlinks to their new versions.

This is best explained with an example: suppose the WORKDIR is /workdir and the files to be updated are /one/one and /two/two, with new versions /one/one.tmp and /two/two.tmp respectively.

  /one/one       # old version of /one/one
  /one/one.tmp   # new version of /one/one
  /two/two       # old version of /two/two
  /two/two.tmp   # new version of /two/two

First, we do lots of preparation without making any change to the files. We create the /workdir directory, with old and new subdirectories. We make /workdir/live a symbolic link to old, and we make /workdir/tobe a symbolic link to new. We populate the new directory with symbolic links to the new versions of the files. We create an extra hardlink to each of the old versions of the files, and populate the old directory with symbolic links to those old versions. We generate some symbolic links into the /workdir/live path as well.

After all that, we have:

  /one/one           # old version of /one/one
  /one/one.tmp.old   # old version of /one/one
  /one/one.tmp       # new version of /one/one
  /one/one.tmp.lnk -> /workdir/live/1

  /two/two           # old version of /two/two
  /two/two.tmp.old   # old version of /two/two
  /two/two.tmp       # new version of /two/two
  /two/two.tmp.lnk -> /workdir/live/2

  /workdir
  /workdir/new
  /workdir/new/1 -> /one/one.tmp
  /workdir/new/2 -> /two/two.tmp
  /workdir/old
  /workdir/old/1 -> /one/one.tmp.old
  /workdir/old/2 -> /two/two.tmp.old
  /workdir/tobe -> new
  /workdir/live -> old

Now to start interfering with the files. We do:

  rename('/one/one.tmp.lnk', '/one/one');

so now we have:

  /one/one -> /workdir/live/1
  /one/one.tmp.old   # old version of /one/one
  /one/one.tmp       # new version of /one/one

  /two/two           # old version of /two/two
  /two/two.tmp.old   # old version of /two/two
  /two/two.tmp       # new version of /two/two
  /two/two.tmp.lnk -> /workdir/live/2

  /workdir
  /workdir/new
  /workdir/new/1 -> /one/one.tmp
  /workdir/new/2 -> /two/two.tmp
  /workdir/old
  /workdir/old/1 -> /one/one.tmp.old
  /workdir/old/2 -> /two/two.tmp.old
  /workdir/tobe -> new
  /workdir/live -> old

The file /one/one has the same contents before and after this rename, since it's now a symlink leading to /one/one.tmp.old by a roundabout route. If the Perl process dies at this point, we still have the old versions of both files in place so transaction semantics haven't been violated.

Next we do the same for the other file, giving us:

  /one/one -> /workdir/live/1
  /one/one.tmp.old   # old version of /one/one
  /one/one.tmp       # new version of /one/one

  /two/two -> /workdir/live/2
  /two/two.tmp.old   # old version of /two/two
  /two/two.tmp       # new version of /two/two

  /workdir
  /workdir/new
  /workdir/new/1 -> /one/one.tmp
  /workdir/new/2 -> /two/two.tmp
  /workdir/old
  /workdir/old/1 -> /one/one.tmp.old
  /workdir/old/2 -> /two/two.tmp.old
  /workdir/tobe -> new
  /workdir/live -> old

We still haven't changed either file. Now for the commit operation:

  rename('/workdir/tobe', '/workdir/live');

If that rename succeeds then we have:

  /one/one -> /workdir/live/1
  /one/one.tmp.old   # old version of /one/one
  /one/one.tmp       # new version of /one/one

  /two/two -> /workdir/live/2
  /two/two.tmp.old   # old version of /two/two
  /two/two.tmp       # new version of /two/two

  /workdir
  /workdir/new
  /workdir/new/1 -> /one/one.tmp
  /workdir/new/2 -> /two/two.tmp
  /workdir/old
  /workdir/old/1 -> /one/one.tmp.old
  /workdir/old/2 -> /two/two.tmp.old
  /workdir/live -> new

... so now /one/one is a roundabout symlink to /one/one.tmp and /two/two is a roundabout symlink to /two/two.tmp, and the transaction is committed. If the Perl process dies at this point, we have the new versions of both files in place so transaction semantics haven't been violated.

All that remains is to eliminate those symlink chains and tidy up a bit:

  rename('/one/one.tmp', '/one/one');
  rename('/two/two.tmp', '/two/two');

and delete the .tmp.old files and /workdir and we're done.

COMPLICATIONS

If the Perl process is killed or a rename() fails while either of /one/one and /two/two are symlinks into /workdir, then /workdir cannot be deleted or renamed without breaking the chains of symlinks.

The commit() method needs code to eliminate any such chains before deleting /workdir if it finds that /workdir already exists. That code is implemented in the _render_workdir_unused() private method, below.

To make things a bit easier for _render_workdir_unused(), the commit method encodes the full paths to the old versions of the files into the names of the symlinks in the old and new directories. Slash characters are replaced with -s and minus characters are replaced with -m, so the /workdir contents in the example above would actually be:

  /workdir
  /workdir/new
  /workdir/new/-sone-sone -> /one/one.tmp
  /workdir/new/-stwo-stwo -> /two/two.tmp
  /workdir/old
  /workdir/old/-sone-sone -> /one/one.tmp.old
  /workdir/old/-stwo-stwo -> /two/two.tmp.old
  /workdir/tobe -> new
  /workdir/live -> old

The filenames passed in to the addfile() method may be either absolute or relative, as may the WORKDIR path. Since relative symbolic links are interpreted relative to the directory that holds the symlink rather than the working directory of the process accessing it, this can lead to complications. To simplify matters, this module uses absolute paths as the targets of all of the symlinks it creates other than the live and tobe symlinks.

The old versions of the files may not exist. In this case, the commit() method installs a broken symlink as the live version of the file before the commit, and this becomes a working symlink to the new version at commit time.

PRIVATE METHODS

These methods should only be accessed from within this module.

_render_workdir_unused ( WORKDIR )

Identifies and eliminates any symbolic link chains that lead into the workdir WORKDIR and out again, so that WORKDIR can be safely removed.

_delete_workdir ( WORKDIR )

Deletes a workdir that has already been rendered unused.

_encode_filename ( FILENAME )

Encodes the full path to a file into a string that can be used as the name of a symbolic link.

_decode_linkname ( LINKNAME )

Reverses the encoding performed by _encode_filename().

_rename ( FROM, TO )

Performs a rename() system call, and dies on error.

Performs a link() system call, and dies on error.

Performs a symlink() system call, and dies on error.

_mkdir ( DIRNAME )

Creates directory DIRNAME with mode 0700. Dies on error.

_abs_path ( FILENAME )

Returns an absolute path to the file FILENAME, which need not exist. The return value will be tainted if and only if FILENAME is tainted.

SEE ALSO

File::Transaction for a leaner and more portable implementation, lacking strict atomicity.

DBD::SQLite for a completely different approach to transaction semantics with atomic commit against filesystem backed data.

AUTHOR

Nick Cleaton <nick@cleaton.net>

COPYRIGHT

Copyright (C) 2002-2003 Nick Cleaton. All Rights Reserved.

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.