The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

File::AtomicWrite - writes files atomically via rename()

SYNOPSIS

  use File::AtomicWrite ();

  # Standalone method: requires filename and
  # input data (filehandle or scalar ref)
  File::AtomicWrite->write_file(
    { file  => 'data.dat',
      input => $filehandle
    }
  );

  # how paranoid are you?
  File::AtomicWrite->write_file(
    { file     => '/etc/passwd',
      input    => \$scalarref,
      CHECKSUM => 1,
      min_size => 100
    }
  );

  # OO interface
  my $aw = File::AtomicWrite->new(
    { file     => 'name',
      min_size => 100,
      ...
    }
  );

  my $tmp_fh   = $aw->fh;
  my $tmp_file = $aw->filename;

  print $tmp_fh ...

  $aw->checksum($sha1_hexdigest);
  $aw->commit;

DESCRIPTION

This module offers atomic file writes via a temporary file created in the same directory (and therefore, probably the same partition) as the specified file. After data has been written to the temporary file, the rename call is used to replace the target file. The module optionally supports various sanity checks (min_size, CHECKSUM) that help ensure the data is written without errors.

Should anything go awry, the module will die or croak. All calls should be wrapped in eval blocks:

  eval {
    File::AtomicWrite->write_file(...);
  };
  if ($@) {
    die "uh oh: $@";
  }

The module attempts to flush and sync the temporary filehandle prior to the rename call. This may cause portability problems. If so, please let the author know. Also notify the author if false positives from the close call are observed.

CLASS METHODS

write_file options hash reference

Requires a hash reference that must contain both the input and file options. Performs the various required steps in a single method call. Only if all checks pass will the input data be moved to the file file via rename. If not, the module will throw an error, and attempt to cleanup any temporary files created.

See "OPTIONS" for additional settings that can be passed to write_file.

write_file installs local signal handlers for INT, TERM, and __DIE__ to try to cleanup any active temporary files if the process is killed or dies. If these are a problem, use the OO interface, and setup appropriate signal handlers for the application.

safe_level safe_level value

Method to customize the File::Temp module safe_level value. Consult the File::Temp documentation for more information on this option.

Can also be set via the safe_level option.

set_template File::Temp template

Method to customize the default File::Temp template used when creating temporary files. NOTE: if customized, the template must contain a sufficient number of X that suffix the template string, as otherwise File::Temp will throw an error:

  template => "mytmp.X",          # Wrong
  template => "mytmp.XXXXXXXX",   # better

Can also be set via the template option.

new options hash reference

Takes all the same options as write_file, excepting the input option, and returns an object. Sanity checks are deferred until the commit method is called. See "OPTIONS" for additional settings that can be passed to new.

In the event a rollback is required, undef the File::AtomicWrite object. The object destructor should then unlink the temporary file. However, should the process receive a TERM, INT, or other signal that causes the script to exit, the temporary file will not be cleaned up (as observed during testing on several modern *BSD and Linux variants). A signal handler must be installed, which then allows the cleanup code to run:

  my $aw = File::AtomicWrite->new({file => 'somefile'});
  for my $sig_name (qw/INT TERM/) {
    $SIG{$sig_name} = sub { exit }
  }
  ...

Consult perlipc(1) for more information on signal handling, and the eg/cleanup-test program under this module distribution. A __DIE__ signal handler may also be necessary, consult the die perlfunc documentation for details.

Instances must not be reused; create a new instance instead of calling new again on an existing instance. Reuse may cause undefined behavior or other unexpected problems.

INSTANCE METHODS

fh

Returns the temporary filehandle.

filename

Returns the file name of the temporary file.

checksum SHA1 hexdigest

Takes a single argument that must contain the Digest::SHA1 hexdigest of the data written to the temporary file. Enables the CHECKSUM option.

commit

Call this method once finished with the temporary file. A number of sanity checks (if enabled via the appropriate "OPTIONS") will be performed. If these pass, the temporary file will be renamed to the real filename.

No subsequent use of the instance should be made after calling this method, as this would lead to undefined behavior (and probably many error messages).

OPTIONS

The write_file and new methods accept a number of options, supplied via a hash reference:

file => filename

Mandatory. A filename in the current working directory, or a path to the file that will (eventually) be created. By default, the temporary file will be written into the parent directory of the file path. This default can be changed by using the tmpdir option.

If the MKPATH option is true, the module will attempt to create any missing directories. If the MKPATH option is false or not set, the module will throw an error should any parent directories of the file not exist.

input => scalar ref or filehandle

Mandatory for the write_file method, illegal for the new method. Scalar reference, or otherwise some filehandle reference that can be looped over via <>. Supplies the data to be written to file.

safe_level => safe_level value

Optional means to set the File::Temp module safe_level value. Consult the File::Temp documentation for more information on this option.

This value can also be set via the safe_level class method.

template => File::Temp template

Template to supply to File::Temp. Defaults to a reasonable value if unset. NOTE: if customized, the template must contain a sufficient number of X that suffix the template string, as otherwise File::Temp will throw an error.

Can also be set via the set_template class method.

min_size => size

Specify a minimum size (in bytes) that the data written must exceed. If not, the module throws an error.

mode => unix mode

Accepts a Unix mode for chmod to be applied to the file. Usual throwing of error. NOTE: depending on the source of the mode, oct may be required to convert it:

  my $orig_mode = (stat $source_file)[2] & 07777;
  ...->write_file({ ..., mode => $orig_mode });

  my $mode = '0644';
  ...->write_file({ ..., mode => oct($mode) });

The module does not change umask, nor is there a means to specify the permissions on directories created if MKPATH is set.

owner => unix ownership string

Accepts similar arguments to chown(1) to be applied via chown to the file. Usual throwing of error.

  ...->write_file({ ..., owner => '0'   });
  ...->write_file({ ..., owner => '0:0' });
  ...->write_file({ ..., owner => 'user:somegroup' });
tmpdir => directory

If set to a directory, the temporary file will be written to this directory instead of by default to the parent directory of the target file. If the tmpdir is on a different partition than the parent directory for file, or if anything else goes awry, the module will throw an error, as rename(2) cannot operate across partition boundaries.

This option is advisable when writing files to include directories such as /etc/logrotate.d, as the programs that read include files from these directories may read even a temporary dot file while it is being written. To avoid this (slight but non-zero) risk, use the tmpdir option to write the configuration out in full under a different directory on the same partition.

checksum => sha1 hexdigest

If this option exists, and CHECKSUM is true, the module will not create a Digest::SHA1 hexdigest of the data being written out to disk, but instead will rely on the value passed by the caller.

CHECKSUM => true or false

If true, Digest::SHA1 will be used to checksum the data read back from the disk against the checksum derived from the data written out to the temporary file.

Use the checksum option (or checksum method) to supply a Digest::SHA1 hexdigest checksum. This will spare the module the task of computing the checksum on the data being written.

BINMODE => true or false

If true, binmode is set on the temporary filehandle prior to writing the input data to it. Default is not to set binmode.

binmode_layer => LAYER

Supply a LAYER argument to binmode. Enables BINMODE.

  # just binmode (binary data)
  ...->write_file({ ..., BINMODE => 1 });
  
  # custom binmode layer
  ...->write_file({ ..., binmode_layer => ':utf8' });
MKPATH => true or false

If true, attempt to create the parent directories of file should that directory not exist. If false (or unset), and the parent directory does not exist, the module throws an error. If the directory cannot be created, the module throws an error.

If true, this option will also attempt to create the tmpdir directory, if that option is set.

BUGS

No known bugs.

Reporting Bugs

Newer versions of this module may be available from CPAN.

If the bug is in the latest version, send a report to the author. Patches that fix problems or add new features are welcome.

http://github.com/thrig/File-AtomicWrite

Known Issues

See perlport for various portability problems possible with the rename call. Consult rename(2) or equivalent for caveats. Note however that rename(2) is used heavily by common programs such as mv(1) and rsync.

File hard links created by ln(1) will be broken by this module, as this module has no way of knowing whether any other files link to the inode of the file being operated on:

  % touch afile
  % ln afile afilehardlink
  % ls -i afile*          
  3725607 afile         3725607 afilehardlink
  % perl -MFile::AtomicWrite -e \
    'File::AtomicWrite->write_file({file =>"afile",input=>\"foo"})' 
  % ls -i afile*
  3725622 afile         3725607 afilehardlink

Union or bind mounts might also be a problem, if what is actually some other filesystem is present between the temporary and final file locations.

Some filesystems may also require a fsync call on a filehandle of the directory containing the file (see fsync(2) on RHEL, for example), to ensure that the directory data also reaches disk, in addition to the contents of the file.

SEE ALSO

Supporting modules:

File::Temp, File::Path, File::Basename, Digest::SHA1

Alternatives, depending on the need, include:

IO::Atomic, File::Transaction, File::Transaction::Atomic, Directory::Transactional

AUTHOR

Jeremy Mates, <jmates@cpan.org>

COPYRIGHT

Copyright 2009-2010,2012-2013 by Jeremy Mates.

This program is free software; you can redistribute it and/or modify it under the Artistic license.