The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

App::dupfind::Threaded::MapReduce::Weed - Map-reduce version of weed_dups, and the worker thread for it

VERSION

version 0.172690

DESCRIPTION

Overrides the weed_dups method from App::dupfind::Common and implements an worker thread routine that is invoked therein. In this threaded version of weed_dups, the set of same-size file groupings is mapped as a task and sent to the main map reducer logic engine implemented in App::dupfind::Threaded::MapReduce. The outcome of that multithreaded map-reduce operation is a significantly smaller list of potential duplicates (or no duplicates if none were left after the weeding-out).

Please don't use this module by itself. It is for internal use only.

METHODS

weed_dups

Calls the map-reduce logic on the $size_dups hashref, providing a wrapped coderef calling out to _weed_worker for every weeding algorithm that has been specified by the user. The coderef mappings are then invoked by the map-reduce engine for same-size size file groupings

This overrides the weed_dups method in App::dupfind::Common

_weed_worker

Runs weed-out passes for same-size file groupings, using $weeder, where $weeder is a weed-out algorithm that tosses out non-dupes by use of more efficient means than hashing alone. The idea is to read as little as possible from the disk while searching out dupes, and to use file hashing (digests) as a last resort.