Date::RetentionPolicy - Prune a list of dates down to the ones you want to keep
version 0.01
my $rp= Date::RetentionPolicy->new( retain => [ { interval => { hours => 6 }, history => { months => 3 } }, { interval => { days => 1 }, history => { months => 6 } }, { interval => { days => 7 }, history => { months => 9 } }, ] ); my $dates= [ '2018-01-01 03:23:00', '2018-01-01 09:45:00', ... ]; my $pruned= $rp->prune($dates); for (@$pruned) { # delete the backup dated $_ ... }
Often when making backups of a thing, you want to have more frequent snapshots for recent dates, but don't need that frequency further back in time, and want to delete some of the older ones to save space.
The problem of deciding which snapshots to delete is non-trivial because backups often don't complete on a timely schedule (despite being started on a schedule) or have discontinuities from production mishaps, and it would be bad if your script wiped out the only backup in an interval just because it didn't look like one of the "main" timestamps. Also it would be bad if the jitter from the time zone or time of day that you run the pruning process caused the script to round differently and delete the backups it had previously decided to keep.
This module uses an algorithm where you first define the intervals which should retain a backup, then assign the existing timestamps to those intervals (possibly reaching across the interval boundary a bit in order to preserve a nearby timestamp; see reach_factor) thus making an intelligent decision about which timestamps to keep.
This module currently depends on DateTime, but I'm happy to accept patches to allow it to work with other Date classes.
An arrayref of specifications for what to preserve. Each element should be a hashref containing history and interval. history specifies how far backward from "reference_date" to apply the intervals, and interval specifies the time difference between the backups that need preserved.
history
interval
As an example, consider
retain => [ { interval => { days => 1 }, history => { days => 20 } }, { interval => { hours => 1 }, history => { hours => 48 } }, ]
This will attempt to preserve timestamps near the marks of "reference_date", an hour before that, an hour before that, and so on for the past 48 hours. It will also attempt to preserve "reference_date", a day before that, a day before that, and so on for the past 20 days.
There is another setting called "reach_factor" that determines how far from the desired timestamp the algorithm will look for something to preserve. The default reach_factor of 0.5 means that it will scan from half an interval back in time until half an interval forward in time looking for the closest timestamp to preserve. In some cases, you may want a narrower or wider search distance, and you can set reach_factor accordingly. You can also supply it as another hash key for a retain rule for per-rule customization.
reach_factor
0.5
retain => [ { interval => { days => 1 }, history => { days => 20 }, reach_factor => .75 } ]
When date strings are involved, parse them as this time zone before converting to an epoch value used in the calculations. The default is 'floating'.
'floating'
The multiplier for how far to look in each direction from an interval point. See discussion in "retain".
The end-point from which all intervals will be calculated. There is no default, to allow "reference_date_or_default" to always pick up the current time when called.
Read-only. Return (a clone of) "reference_date", or if it isn't set, return the current date in the designated "time_zone" rounded up to the next day boundary.
While walking backward through time intervals looking for backups, adjust the interval endpoint to be closer to whatever match it found. This might allow the algorithm to essentially adjust the reference_date to match whatever schedule your backups are running on. This is not enabled by default.
reference_date
my $pruned_arrayref= $self->prune( \@times );
@times may be an array of epoch numbers, DateTime objects, or date strings in any format recognized by DateTime::Format::Flexible. Epochs are currently the most efficient type of argument since that's what the algorithm operates on.
@times
print $rp->visualize( \@list );
This method takes a list of timestamps, sorts and marks them for retention, and then returns printable text showing the retention intervals and which increment it decided to keep. The text is simple ascii-art, and requires a monospace font to display correctly.
Michael Conrad <mconrad@intellitree.com>
This software is copyright (c) 2018 by IntelliTree Solutions llc.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
To install Date::RetentionPolicy, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Date::RetentionPolicy
CPAN shell
perl -MCPAN -e shell install Date::RetentionPolicy
For more information on module installation, please visit the detailed CPAN module installation guide.