The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Date::RetentionPolicy - Prune a list of dates down to the ones you want to keep

VERSION

version 0.01

SYNOPSIS

  my $rp= Date::RetentionPolicy->new(
    retain => [
      { interval => { hours => 6 }, history => { months => 3 } },
      { interval => { days  => 1 }, history => { months => 6 } },
      { interval => { days  => 7 }, history => { months => 9 } },
    ]
  );
  
  my $dates= [ '2018-01-01 03:23:00', '2018-01-01 09:45:00', ... ];
  my $pruned= $rp->prune($dates);
  for (@$pruned) {
    # delete the backup dated $_
    ...
  }

DESCRIPTION

Often when making backups of a thing, you want to have more frequent snapshots for recent dates, but don't need that frequency further back in time, and want to delete some of the older ones to save space.

The problem of deciding which snapshots to delete is non-trivial because backups often don't complete on a timely schedule (despite being started on a schedule) or have discontinuities from production mishaps, and it would be bad if your script wiped out the only backup in an interval just because it didn't look like one of the "main" timestamps. Also it would be bad if the jitter from the time zone or time of day that you run the pruning process caused the script to round differently and delete the backups it had previously decided to keep.

This module uses an algorithm where you first define the intervals which should retain a backup, then assign the existing timestamps to those intervals (possibly reaching across the interval boundary a bit in order to preserve a nearby timestamp; see reach_factor) thus making an intelligent decision about which timestamps to keep.

DATES

This module currently depends on DateTime, but I'm happy to accept patches to allow it to work with other Date classes.

ATTRIBUTES

retain

An arrayref of specifications for what to preserve. Each element should be a hashref containing history and interval. history specifies how far backward from "reference_date" to apply the intervals, and interval specifies the time difference between the backups that need preserved.

As an example, consider

  retain => [
    { interval => { days => 1 }, history => { days => 20 } },
    { interval => { hours => 1 }, history => { hours => 48 } },
  ]

This will attempt to preserve timestamps near the marks of "reference_date", an hour before that, an hour before that, and so on for the past 48 hours. It will also attempt to preserve "reference_date", a day before that, a day before that, and so on for the past 20 days.

There is another setting called "reach_factor" that determines how far from the desired timestamp the algorithm will look for something to preserve. The default reach_factor of 0.5 means that it will scan from half an interval back in time until half an interval forward in time looking for the closest timestamp to preserve. In some cases, you may want a narrower or wider search distance, and you can set reach_factor accordingly. You can also supply it as another hash key for a retain rule for per-rule customization.

  retain => [
    { interval => { days => 1 }, history => { days => 20 }, reach_factor => .75 }
  ]

time_zone

When date strings are involved, parse them as this time zone before converting to an epoch value used in the calculations. The default is 'floating'.

reach_factor

The multiplier for how far to look in each direction from an interval point. See discussion in "retain".

reference_date

The end-point from which all intervals will be calculated. There is no default, to allow "reference_date_or_default" to always pick up the current time when called.

reference_date_or_default

Read-only. Return (a clone of) "reference_date", or if it isn't set, return the current date in the designated "time_zone" rounded up to the next day boundary.

auto_sync

While walking backward through time intervals looking for backups, adjust the interval endpoint to be closer to whatever match it found. This might allow the algorithm to essentially adjust the reference_date to match whatever schedule your backups are running on. This is not enabled by default.

METHODS

prune

  my $pruned_arrayref= $self->prune( \@times );

@times may be an array of epoch numbers, DateTime objects, or date strings in any format recognized by DateTime::Format::Flexible. Epochs are currently the most efficient type of argument since that's what the algorithm operates on.

visualize

  print $rp->visualize( \@list );

This method takes a list of timestamps, sorts and marks them for retention, and then returns printable text showing the retention intervals and which increment it decided to keep. The text is simple ascii-art, and requires a monospace font to display correctly.

AUTHOR

Michael Conrad <mconrad@intellitree.com>

COPYRIGHT AND LICENSE

This software is copyright (c) 2018 by IntelliTree Solutions llc.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.