The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

trimtrees - traverse directories, find identical files, replace with hard links

SYNOPSIS

 trimtrees.pl OPTIONS directory...

 OPTIONS:

  --maxlinks N            limit the amount of links per file

DESCRIPTION

Traverse all directories named on the command line, compute MD5 checksums and find files with identical MD5. IF they are equal, do a real comparison if they are really equal, replace the second of two files with a hard link to the first one.

Special care is taken to cope with Too many links error conditions. The inode that is overbooked in such a way, is taken out of the pool and replaced with the another one such that the minimum of files needed is kept on disk.

The --maxlinks option can be used to reduce the linkcount on all files within a tree, thus preparing the tree for a subsequent call to cp -al. This operation can be thought of the reverse of the normal trimtrees operation (--maxlinks=1 produces a tree without hard links).

SIGNALS

SIGINT is caught and the script stops as soon as the current file is finished.

RISKS

The whole idea of replacing identical files with hard links has inherent dangers. Once two files have turned into one inode other processes may accidentally change both although they intend to alter only one. Please consider if this can happen in your environment.