The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Directory::Diff - recursively find differences between similar directories

SYNOPSIS

    use Directory::Diff 'directory_diff';
    use FindBin '$Bin';
    
    # Do a "diff" between "old_dir" and "new_dir"
    
    directory_diff ("$Bin/old_dir", "$Bin/new_dir", 
                    {diff => \& diff,
                     dir1_only => \& old_only});
    
    # User-supplied callback for differing files
    
    sub diff
    {
        my ($data, $dir1, $dir2, $file) = @_;
        print "$dir1/$file is different from $dir2/$file.\n";
    }
    
    # User-supplied callback for files only in one of the directories
    
    sub old_only
    {
        my ($data, $dir1, $file) = @_;
        print "$dir1/$file is only in the old directory.\n";
    }
    

produces output

    /usr/home/ben/projects/directory-diff/examples/old_dir/old-file is only in the old directory.
    /usr/home/ben/projects/directory-diff/examples/old_dir/diff-file is different from /usr/home/ben/projects/directory-diff/examples/new_dir/diff-file.

(This example is included as synopsis.pl in the distribution.)

VERSION

This documents Directory-Diff version 0.08 corresponding to git commit 80910e5ff3eaeb27ebbe432883b1798ab6dc6ebe released on Sat Nov 17 06:38:54 2018 +0900.

DESCRIPTION

Directory::Diff finds differences between two directories and all their subdirectories, recursively. If it finds a file with the same name in both directories, it uses File::Compare to find out whether they are different. It is callback-based and takes actions only if required.

FUNCTIONS

The main function of this module is "directory_diff". The other functions listed here are helper functions, but these can be exported on request.

directory_diff

     directory_diff ("dir1", "dir2", 
                     {dir1_only => \&dir1_only,
                      diff => \& diff});

Given two directories dir1 and dir2, this calls back a user-supplied routine for each of three cases:

A file is only in the first directory

In this case a callback specified by dir1_only is called once

     &{$third_arg->{dir1_only}} ($third_arg->{data}, "dir1", $file);

for each file $file which is in dir1 but not in dir2, including files in subdirectories.

A file is only in the second directory

In this case a callback specified by dir2_only is called once

     &{$third_arg->{dir2_only}} ($third_arg->{data}, "dir2", $file);

for each file $file which is in dir2 but not in dir1, including files in subdirectories.

A file with the same name but different contents is in both directories

In this case a callback specified by diff is called once

     &{$third_arg->{diff}} ($third_arg->{data}, "dir1", "dir2", $file);

for each file name $file which is in both dir1 and in dir2, including files in subdirectories.

The first argument to each of the callback functions is specified by data. The second argument to dir1_only and dir2_only is the directory's name. The third argument is the file name, which includes the subdirectory part. The second and third arguments to diff are the two directories, and the fourth argument is the file name including the subdirectory part.

If the user does not supply a callback, no action is taken, even if a file is found.

The routine does not return a meaningful value. It does not check the return values of the callbacks. Therefore if it is necessary to stop midway, the user must use something like eval { } and die.

A fourth argument, if set to any true value, causes directory_diff to print messages about what it finds and what it does.

ls_dir

     my %ls = ls_dir ("dir");

ls_dir makes a hash containing a true value for each file and directory which is found under the directory given as the first argument.

If a second argument with a true value is set, it prints debugging messages. For example

     my %ls = ls_dir ("dir", 1);

get_only

     my %only = get_only (\%dir1, \%dir2);

Given two hashes containing true values for each file or directory under two directories, return a hash containing true values for the files and directories which are in the first directory hash but not in the second directory hash.

For example, if

     %dir1 = ("file" => 1, "dir/" => 1, "dir/file" => 1);

and

     %dir2 = ("dir/" => 1, "dir2/" => 1);

get_only returns

     %only = ("file" => 1, "dir/file" => 1);

A third parameter for debugging makes the module print messages on what is found if set to a true value, for example,

     my %only = get_only (\%dir1, \%dir2, 1);

get_diff

     my %diff = get_diff ("dir1", \%dir1_ls, "dir2", \%dir2_ls);

Get a list of files which are in both dir1 and dir2, but which are different. This uses File::Compare to test the files for differences. It searches subdirectories. Usually the hashes %dir1_ls and %dir2_ls are those output by "ls_dir".

SEE ALSO

CPAN modules

File::DirCompare

This is similar to Directory::Diff. Unlike Directory::Diff, it does not descend into subdirectories which exist in one directory but not the other.

File::Dircmp

This mimics the output of the Unix diff command. Unlike Directory::Diff, it does not descend into subdirectories which exist in one directory but not the other.

Test::Dirs
Compare::Directory

DEPENDENCIES

This section lists Perl modules which this depends on, with a rationale for why they are used.

Carp

croak and carp are used to report errors.

File::Compare

File::Compare is used to check whether two identically-named files are different or not.

"getcwd" in Cwd

This is used to get the working directory of the module, since it changes directory to the directory where the diff is performed.

File::Copy

See Directory::Diff::Copy.

File::Path

See Directory::Diff::Copy.

CONTRIBUTORS

This section lists people who have contributed to the module.

Mohammad S. Anwar (MANWAR) contributed fixes for broken tests.

MOTIVATION

This section discusses why I wrote the module and what I use it for.

The reason I wrote this module is because `diff --recursive` stops when it finds a subdirectory which is in one directory and not the other, without descending into the subdirectory. For example, if one has a file like dir1/subdir/file,

     diff -r dir1 dir2

will tell you "Only in dir1: subdir" but it won't tell you anything about the files under "subdir". The two Perl modules on CPAN, "File::Dircmp" and "File::DirCompare" both also stop processing when subdirectories differ.

For my task, I needed to go down into the subdirectory and find all the files which were in all the subdirectories, so I wrote this.

I've been using this module for updating web sites with a lot of pages since 2009, to avoid repeatedly having to upload the entire site's-worth of pages for each small change. The way I use this is as follows. I keep a local copy of the uploaded web site in a directory like old-site, and then rebuild all the pages in another directory like new-site, then I use Directory::Diff::Copy to put the changed files into yet another directory, like changed-site-files. Once the changed files are copied, then I tar, gzip, and upload the directory of changed files, and untar it at the web host, thus replacing only files which have changed. I also delete the old-site directory and rename new-site to old-site at this point in preparation for the next upload.

I'm currently using this for almost all the static content for the following web sites: http://www.sljfaq.org, http://kanji.sljfaq.org, and http://www.lemoda.net. I put this module on github in about 2012 and on CPAN in 2016.

AUTHOR

Ben Bullock, <bkb@cpan.org>

COPYRIGHT & LICENCE

This package and associated files are copyright (C) 2009-2018 Ben Bullock.

You can use, copy, modify and redistribute this package and associated files under the Perl Artistic Licence or the GNU General Public Licence.