The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

App::dupfind::App - This is the application that gets run() by $bin/dupfind

VERSION

version 0.172690

DESCRIPTION

This app-in-a-module is run by the dupfind script itself. This module isn't meant to be the interface for the end-user (that's the purpose of the dupfind script which is bundled with this distribution). The reason the logic is packed up in this module is to allow for easier unit testing, and also to allow the application to be extended via inheritance if desired.

So basically, you don't really need to worry about this. Just install the App::dupfind distribution and call the dupfind executable in the way you'd invoke any other app.

SYNOPSIS

   use App::dupfind::App;

   App::dupfind::App->run;

ATTRIBUTES

opts

Accessor for run time options supplied by the user

benchmarks

Accessor to a collection of benchmarks that are gathered over the course of application execution

metrics

Accessor used to save and retrieve metrics information

deduper

Depending on user input, it will be an App::dupfind object or an App::dupfind::Threaded object (if threading was selected).

METHODS

BUILD

The object builder. Does some validation of user input.

_usage

Prints out a help message via Pod::Usage so that the POD documentation in whatever namespace calls the app gets used as the help message.

This is why $bin/dupfind is very short on code, but has a good POD page.

_bench_this

Private method used to make internal benchmarking easier, since the application makes frequent use of them to time key steps in its workflow.

_calculate_bench_times

Takes the collected benchmark objects gathered in the previous method and calculates the time difference between their start and end marks, then tucks it away for later display in $self->_run_summary

_run_summary

Displays a summary of findings and metrics after the main output is shown. The summary looks something like this (some output has been replaced with "..." due to space constraints):

   ------------------------------
   ** THREADS...............8
   ** RAM CACHE.............314572800 bytes
   ** CACHE HITS/MISSES.....21162392/14997229
   ** TOTAL FILES SCANNED...73389
   ** TOTAL SAME SIZE.......66058
   ** TOTAL ACTUAL DUPES....48760
         -- TIMES --
   ** TREE SCAN TIME........1.78502 wallclock secs  ...
   ** HARDLINK PRUNE TIME...0.438269 wallclock secs ...
   ** WEED-OUT TIME.........4.14188 wallclock secs  ...
   ** CRYPTO-HASHING TIME...6.25566 wallclock secs  ...
   ** DELETION TIME.........no deletions
   ** TOTAL RUN TIME........14.6825 wallclock secs  ...
run

Runs the application. Takes no arguments. Returns no values.

scanfs

Scans the filesystem directory specified by the user and returns a datastructure containing groupings of files that are the same size, which is the first step in identifying duplicates.

prune

Examines the files returned from $self->scanfs and looks for hard links. If two or more hardlinks are found, they are sorted by filename and all but the first hardlink are discarded.

weed

Runs the weed-out pass(es) on the file groups returned by $self->prune, thereby eliminating as many non-duplicates as possible without having to resort to expensive file hashing (the calculation of file digests).

digest

Calculates file digests against the files returned from $self->weed, and a simple caching mechanism is used to help avoid hashing the same file content more than once if one file is found to be a content match for another.

This is the final basis upon which file uniqueness is determined. After this step, we know with very-near-complete certainty which files are duplicates.

The certainty is limited to the strength of the underlying cryptographic digest algorithm, which is currently xxhash. As with other digests, such as MD5 for example, collisions are possible but extremely unlikely.

remove

Runs a removal (deletion) sequence on file duplicates obtained by $self->digest, interactively prompting the user for confirmation of deletions. Interactive prompting does not happen if the user specified that prompting should be disabled.

Refer to the help documentation in the dupfind executable proper for an explanation on run time command line options and switches.

_stderr

Works like Perl's built-in say function, except:

  • Output goes to STDERR

  • It is a class method. You will have to call it like $object->_stderr( 'foo' );

  • IT OUTPUTS NOTHING if the user passed in the "-q" or "--quiet" flag.