The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

File::LinearRaid - Treat multiple files as one large seamless file for reading and writing.

SYNOPSIS

  use File::LinearRaid;
  
  my $fh = File::LinearRaid->new( "+<",
      "data/datafile0" => 100_000,
      "data/datafile1" =>  50_000,
      "data/datafile2" => 125_000
  );

  ## this chunk of data actually crosses a physical file boundary
  seek $fh, 90_000, 0;
  read $fh, my $buffer, 20_000;
  
  ## replace that chunk with X's
  seek $fh, 90_000, 0;
  print $fh "X" x 20_000;

DESCRIPTION

This module provides a single-filehandle interface to multiple files, in much the same way that a linear RAID provides a single-device interface to multiple physical hard drives.

This module was written to provide random fixed-width record access to a series of files. For example, in the BitTorrent filesharing protocol, several files are shared as a single entity. The final sizes of the individual files are known, but the protocol only sends fixed-width chunks of data. These chunks are not aligned to file boundaries and can span several physical files, but they are only identified by their overall offset and not by the files they span.

This module was created to provide a layer of abstraction around this kind of storage. Instead of calculating possibly many file offsets, and dividing data into smaller pieces, a simple seek and read (or print) on the abstract filehandle will do the right thing, regardless of how the chunk spans the physical files:

  seek $fh, ($chunk_id * $chunk_size), 0;
  read $fh, my $buffer, $chunk_size;
  
  ## or if opened with mode "+<" or similar:
  
  seek $fh, ($chunk_id * $chunk_size), 0;
  print $fh $chunk;

This module may prove useful if your physical file system has a low (2G) limit on file sizes, yet you require access and storage for a large amount of data through a single filehandle.

USAGE

new

    my $fh = File::LinearRaid->new( $mode, $path1 => $size1, ... )

Returns a new aggregate filehandle consisting of the listed paths in that order. Each physical file is opened using the given mode (Note: if $mode is > or +>, all files will be truncated before opening -- see open). If there is an error opening a file, croaks.

For each file, you must specify a maximum length. This need not be the current length of the file:

  • If a physical file is shorter than its specified length, the aggregate filehandle will behave as if the file were null-padded to that length. If no writes are made to the physical file, it will not be modified. If writes are made between the end of the physical file and its specified length, the space between the end of the physical file and the new data is filled out with nulls. The physical file is grown only as far as needed, so it may still be shorter than its specified length.

  • If a physical file is longer than its specified length, the portion of the physical file past that length will be ignored. The data past the specified length will be preserved, as long as opening the file with $mode doesn't truncate it first.

append

  $fh->append( $path1 => $size1, ... )

Append new file(s) to the end of the aggregate filehandle, with the given size(s). Returns a true value if successful, otherwise croaks.

size

  $fh->size

Returns the current maximum size of the aggregate filehandle.

Filehandle operations

Currently, read, readline, print, getc, and open are implemented, so you should be able to use most file operations seamlessly. Writing to the aggregate filehandle past the total length is not supported. In other words, the final physical file will not be grown as needed. You must use append.

CAVEATS

  • May not play well with Unicode / wide characters.

  • Error checking is quite limited.

  • Formats are untested, I don't use them and don't know if they'll work.

  • Not rigorously tested with huge files. I don't have that much disk space! As long as the individual file limits are less than your OS size limit, and the size of the aggregate filehandle can be stored in an integer, everything should be fine. You might even be able to seek to a BigInt value on the aggregate filehandle, but this is not guaranteed to work.

AUTHOR

File::LinearRaid is written by Mike Rosulek <mike@mikero.com>. Feel free to contact me with comments, questions, patches, or whatever.

COPYRIGHT

Copyright (c) 2004 Mike Rosulek. All rights reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.