App::dupfind::App - This is the application that gets run() by $bin/dupfind


version 0.172690


This app-in-a-module is run by the dupfind script itself. This module isn't meant to be the interface for the end-user (that's the purpose of the dupfind script which is bundled with this distribution). The reason the logic is packed up in this module is to allow for easier unit testing, and also to allow the application to be extended via inheritance if desired.

So basically, you don't really need to worry about this. Just install the App::dupfind distribution and call the dupfind executable in the way you'd invoke any other app.


   use App::dupfind::App;




Accessor for run time options supplied by the user


Accessor to a collection of benchmarks that are gathered over the course of application execution


Accessor used to save and retrieve metrics information


Depending on user input, it will be an App::dupfind object or an App::dupfind::Threaded object (if threading was selected).



The object builder. Does some validation of user input.


Prints out a help message via Pod::Usage so that the POD documentation in whatever namespace calls the app gets used as the help message.

This is why $bin/dupfind is very short on code, but has a good POD page.


Private method used to make internal benchmarking easier, since the application makes frequent use of them to time key steps in its workflow.


Takes the collected benchmark objects gathered in the previous method and calculates the time difference between their start and end marks, then tucks it away for later display in $self->_run_summary


Displays a summary of findings and metrics after the main output is shown. The summary looks something like this (some output has been replaced with "..." due to space constraints):

   ** THREADS...............8
   ** RAM CACHE.............314572800 bytes
   ** CACHE HITS/MISSES.....21162392/14997229
   ** TOTAL SAME SIZE.......66058
   ** TOTAL ACTUAL DUPES....48760
         -- TIMES --
   ** TREE SCAN TIME........1.78502 wallclock secs  ...
   ** HARDLINK PRUNE TIME...0.438269 wallclock secs ...
   ** WEED-OUT TIME.........4.14188 wallclock secs  ...
   ** CRYPTO-HASHING TIME...6.25566 wallclock secs  ...
   ** DELETION deletions
   ** TOTAL RUN TIME........14.6825 wallclock secs  ...

Runs the application. Takes no arguments. Returns no values.


Scans the filesystem directory specified by the user and returns a datastructure containing groupings of files that are the same size, which is the first step in identifying duplicates.


Examines the files returned from $self->scanfs and looks for hard links. If two or more hardlinks are found, they are sorted by filename and all but the first hardlink are discarded.


Runs the weed-out pass(es) on the file groups returned by $self->prune, thereby eliminating as many non-duplicates as possible without having to resort to expensive file hashing (the calculation of file digests).


Calculates file digests against the files returned from $self->weed, and a simple caching mechanism is used to help avoid hashing the same file content more than once if one file is found to be a content match for another.

This is the final basis upon which file uniqueness is determined. After this step, we know with very-near-complete certainty which files are duplicates.

The certainty is limited to the strength of the underlying cryptographic digest algorithm, which is currently xxhash. As with other digests, such as MD5 for example, collisions are possible but extremely unlikely.


Runs a removal (deletion) sequence on file duplicates obtained by $self->digest, interactively prompting the user for confirmation of deletions. Interactive prompting does not happen if the user specified that prompting should be disabled.

Refer to the help documentation in the dupfind executable proper for an explanation on run time command line options and switches.


Works like Perl's built-in say function, except:

  • Output goes to STDERR

  • It is a class method. You will have to call it like $object->_stderr( 'foo' );

  • IT OUTPUTS NOTHING if the user passed in the "-q" or "--quiet" flag.