The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

DataStore::CAS::FS::DirCodec::Universal - Codec for saving all arbitrary fields of a DirEnt

VERSION

version 0.011000

SYNOPSIS

  require DataStore::CAS::FS::DirCodec::Universal
  
  my %metadata= ( foo => 1, bar => 42 );
  my @entries= ( { name => 'file1', type => 'file', ref => 'SHA1DIGESTVALUE', mtime => '1736354736' } );
  
  my $digest_hash= DataStore::CAS::FS::DirCodec->put( $cas, 'universal', \@entries, \%metadata );
  my $dir= DataStore::CAS::FS::DirCodec->load( $cas->get($digest_hash) );
  
  print Dumper( $dir->get_entry('file1') );

DESCRIPTION

This DirCodec can store any arbitrary metadata about a file. It uses JSON for its encoding, so other languages/platforms should be able to easily interface with the files this codec writes ... except for Unicode caveats.

Unicode

JSON requires that all data be proper Unicode, and some filenames might be a sequence of bytes which is not a valid Unicode string. While the high-ascii bytes of these filenames could be encoded as unicode code-points, this would create an ambiguity with the names that actually were Unicode. Instead, I wrap values which are intended to be a string of octets in an instance of DataStore::CAS::Dir::InvalidUTF8, which gets written into JSON as

  C<{ "*InvalidUTF8*": $bytes_as_codepoints }>

Any attribute which contains bytes >= 0x80 and which does not have Perl's unicode flag set will be encoded this way, so that it comes back as it went in.

However, since filenames are intended to be human-readable, they are decoded as unicode strings when appropriate, even if they arrived as octets which just happened to be valid UTF-8.

METHODS

encode

  my $serialized= $class->encode( \@entries, \%metadata )

Serialize the given entries into a scalar.

@entries is an array of DirEnt objects or hashrefs mimicing them.

%metadata is a hash of arbitrary metadata which you want saved along with the directory.

This "Universal" DirCodec serializes the data as a short one-line header followed by a string of JSON. JSON isn't the most efficient format around, but it has wide cross-platform support, and can store any arbitrary DirEnt attributes that you might have, and even structure within them.

The serialization contains newlines in a manner that should make it convenient to write custom processing code to inspect the contents of the directory without decoding the whole thing with a JSON library.

If you add anything to the metadata, try to keep the data consistent so that two encodings of the same directory are identical. Otherwise, (in say, a backup utility) you will waste disk space storing multiple copies of the same directory.

decode

  $dir= $class->decode( %params )

Reverses encode, to create a Dir object.

See DirCodec->load for details on %params.

AUTHOR

Michael Conrad <mconrad@intellitree.com>

COPYRIGHT AND LICENSE

This software is copyright (c) 2013 by Michael Conrad, and IntelliTree Solutions llc.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.