The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Data::Tubes::Plugin::Validator

DESCRIPTION

This module contains factory functions to generate tubes that ease validation of records.

The factory functions below have two names, one starting with validate_ and the other without this prefix. They are perfectly equivalent to each other, whereas the short version can be handier e.g. when using tube or pipeline from Data::Tubes.

FUNCTIONS

admit

   $tube = admit(@validators); # OR
   $tube = admit(@validators, \%args);

simple validator mainly aimed at providing regular expressions to be tested against text input. For this reason, the default input field from the record is raw and not structured like in "thoroughly", although you can override it.

The goal of this validator is to get rid of uninteresting parts quickly. For this reason, there is no on-the-fly collection of validation outcomes; see "thoroughly" if you need them.

The input record MUST be a hash reference except when input is set to undef. In this case it can be almost anything, although it SHOULD be a string if you plan using regular expressions (anything that stringifies will do, anyway).

@validators can be either regular expressions or sub references. All of them MUST pass to get the input record out, otherwise you will get nothing (which means that the particular section of the pipeline will stop here for the specific record). You can revert this behaviour setting option refuse to a true value.

Accepts the following options:

input

the name of the input field in the input record. Defaults to raw, which means that this validator is mainly aimed at filtering input records before they are parsed. You can set it to undef and the input record itself will be validated, not a subfield;

name

name of the tube, useful for debugging;

refuse

boolean flag to indicate that the test should be reversed, i.e. that all provided @validators MUST fail for getting the record on. If this is the case, you might be interested in using "validate_refuse", because it advertises your intentions a bit more clearly.

refuse

   $tube = refuse(@validators); # OR
   $tube = refuse(@validators, \%args);

This is the same as "validate_admit", except that the parameter refuse is inconditionally set to a true value. No, you cannot revert this setting refuse to a false value explicitly, because it would not be sane.

refuse_comment

   $tube = refuse_comment(%args); # OR
   $tube = refuse_comment(\%args);

Thin wrapper around "validate_refuse" to eliminate comment lines, defined as any line that starts with optional spaces and whose first non-space character is the hash #.

refuse_comment_or_empty

   $tube = refuse_comment(%args); # OR
   $tube = refuse_comment(\%args);

Thin wrapper around "validate_refuse" to eliminate comment or empty lines, defined as any line that starts with optional spaces and whose first non-space character, if present, is the hash #.

refuse_empty

   $tube = refuse_comment(%args); # OR
   $tube = refuse_comment(\%args);

Thin wrapper around "validate_refuse" to eliminate empty lines, defined as any line that only contains optional whitespaces.

thoroughly

   $tube = thoroughly(@validators); # OR
   $tube = thoroughly(@validators, \%args);

validate record according to provided @validators.

Differently from other validators in this plugin:

  • the input record MUST be a hash reference;

  • the input to be validated is set via argument input, that defaults to structured (instead of raw). You might want to change this if you intend to use regular expression validators;

  • one field in the hash (according to factory argument output, set to validation by default) is set to the output of the validation operation.

Items in @validators can be sub references, regex references or array references, as explained below. An optional hash reference at the end can carry options, see below for their explanation.

A validator basically boils down to a sub reference that is called to perform the validation, or a regular expression. It can be either provided directly as an item in @validators, or embedded in an array reference, prefixed with a name and with optional additional parameters. Example:

   $tube = thoroughly(
      sub { $_[0]{foo} =~ /bar|baz/ }, # straight sub ref
      [
         'Number should be even',
         sub { $_[0]{number} % 2 == 0 },
      ],
      [
         'Name of something else',
         sub { ... },
         @parameters
      ],
   );

The validator function will be called in list context, like this:

   my @outcome = $validator->(
      $target,     # what pointed by "input", or the whole record
      $record,     # the whole record, if necessary
      \%args,      # args passed to the factory
      @parameters, # anything sub in the array ref version
   );

The validator can:

  • return the empty list, in which case the validation is considered failed;

  • return a single value, that represents the outcome of the validation. Anything considered false by Perl means that the validation failed, otherwise the validation is considered a success;

  • return more values, the first representing the outcome of the validation as in the previous bullet, the following ones things that you want to track as the outcome of the validation (e.g. some explanation of what went wrong with the validation).

If one of the validators throws an exception, this will not be trapped unless wrapper is set properly. See below if you want to catch exceptions and transform them into failed validation steps.

All validations are performed in the order provided in @validators, independently of whether they succeed or fail. This is by design, so that you can provide a thorough feedback about what you think is wrong with the input data.

Validation outcomes are collected into an array of arrays that is eventually referenced by the record provided as output (which is the same as the input, only augmented). By default this array of arrays is referenced by key validation, but you can control the key via option output.

Normally, only failed validations are collected in the array, so that you can easily check if validation was successful at a later stage. You can decide to collect all outcomes via option keep_positives.

By default, if the validation collection procedure does not collect anything (i.e. all validations were successful and keep_positives is false), the output key is set to undef, so that you can check for validation errors very quickly instead of checking the number of items in the array. If you prefer to receive an empty array instead, you can set option keep_empty.

You can wrap the call to all your validators via an optional wrapper sub reference. This means that the following call will be used instead:

   my @outcome = $wrapper->(
      $validator,  # the validation function
      $target,     # what pointed by "input", or the whole record
      $record,     # the whole record, if necessary
      \%args,      # args passed to the factory
      @parameters, # anything sub in the array ref version
   );

In this case, your wrapper function will be responsible for calling $validator in the right way. You can use this e.g. to perform some adaptation of interface for either the input or the output of the validation sub. As a matter of fact, in this case $validator is not even required to be a sub reference.

In addition to setting wrapper to a sub reference, you can also set it to the string try. This will wrap the call to the validator in a try/catch using Try::Catch, which you are supposed to have installed independently.

Allowed arguments are:

input

the name of the input field in the record. Defaults to structured, in the assumption that you will want to perform validation after parsing, but you can of course set it to whatever you want. If you set it to undef, the whole input record will be considered the $target for the validation. Keep in mind that each validator will always receive also a reference to the $record as the second argument anyway;

keep_empty

if all validators succeed and keep_positives below is false, the overall outcome of the validation process will be an empty array. This option allows you to control whether you want an empty array as output in this case, or you prefer to receive a false value for quicker identification of no validation errors condition. Defaults to 0, i.e. a false value, meaning that you will receive an undefined value in output in case all validations were successful;

keep_positives

validations that are successful are normally discarded, as you are assumed to be interested into failures most. If you want an account of all the validation steps, instead, you can set this flag to a true value. Defaults to 0, a false value, meaning that positive validations are discarded;

name

the name of the tube, useful when debugging. Defaults to validate with subs;

output

the name of the output field in the output record. Defaults to validation.

wrapper

a subroutine to wrap each call to a validator. In this case, thoroughly will call the wrapper instead, passing as the first parameter the validator, then the list of parameter it would have passed to the validator itself.

You can also pass the special value try, that allows you to set the following wrapper subroutine equivalent:

   use Try::Catch;
   sub {
      my ($validator, @parameters) = @_;
      return try {
         $validator->(@parameters);
      }
      catch {
         (0, $_);
      };
   }

except that Try::Catch is loaded dynamically at runtime and no function is imported. This allows you to turn exceptions into failed validations (note that the first item in the expression inside the catch part is 0, i.e. a failed validation) where the exception iteself is passed as additional "reason" that is eventually collected in the outcome.

A few examples should be of help now.

First, an example with validators that all return a true or false value, hence there is nothing to trap:

   my $v = thoroughly(
      sub { $_[0]{foo} =~ /bar|baz/ },
      ['is-even' => sub { $_[0]{number} % 2 == 0 }],
      ['in-bounds' =>
       sub { $_[0]{number} >= 10 && $_[0]{number} <= 21}]
   );

   my $o1 = $v->({structured => {foo => 'bar', number => 12}});
   my $o2 = $v->({structured => {foo => 'bar', number => 13}});
   my $o3 = $v->({structured => {foo => 'hey', number => 3}});

In all cases the output record contains a new validation key, pointing to:

  • $o1 an undef value

  • $o2 an array reference like this:

       [ ['is-even', ''] ]

    because the test is-even fails returning an empty string

  • $o3 an array reference like this:

       [
          ['validator-0', 0],  # empty list transformed into "0"
          ['is-even', ''],     # empty string from validator
          ['in-bound', '']     # empty string from validator
       ]

As you can see, in the case of the first test a name is automatically generated based on the index of the test in the list of validators.

Here's an example for trapping exceptions:

   my $v= thoroughly(
      sub { $_[0]{foo} =~ /bar|baz/ },
      ['is-even' => 
       sub { ($_[0]{number} % 2 == 0) or die "odd\n" }],
      ['in-bounds' =>
       sub { $_[0]{number} >= 10 or die "too low\n" }],
      {wrapper => 'try'},
   );

   my $o4 = $v->({structured => {foo => 'bar', number => 13}});
   my $o5 = $v->({structured => {foo => 'hey', number => 3}});

Again, you will get a validation key in each output record, like this:

$o4

only the first test fails in this case, so this is what you get:

   [ ['is-even', 0, "odd\n"] ]
$o5

all three tests fail, two with exception, leading to this:

   [
      ['validator-0', 0],        # as before
      ['is-even', 0, "odd\n"],   # exception to failure
      ['in-bound', 0, "too low"] # exception to failure
   ]

You hopefully get the idea at this point.

It's important to always remember the difference between the following validators:

   sub { ($_[0]{number} % 2 == 0) or die "odd\n" };
   sub { die "odd\n" if $_[0]{number} % 2 };

The second validator always fails: it either throws an exception, or returns a false value. This is not the case with the first one. Always remember to return a true value from your validators, like this:

   sub { die "odd\n" if $_[0]{number} % 2; 1 }

(Yes, this actually happened while writing the tests...)

validate_admit

Alias for "admit".

validate_refuse

Alias for "refuse".

validate_refuse_comment

Alias for "refuse_comment".

validate_refuse_comment_or_empty

Alias for "refuse_comment_or_empty".

validate_refuse_empty

Alias for "refuse_empty".

validate_thoroughly

Alias for "thoroughly".

validate_with_subs

Alias for "with_subs".

with_subs

   $tube = with_subs(@validators); # OR
   $tube = with_subs(@validators, \%args);

This function is DEPRECATED and currently aliased to "thoroughly". It used to do all that "thoroughly" does, except handling regular expression validators; now ith supports them too, which is why a name change was necessary.

BUGS AND LIMITATIONS

Report bugs either through RT or GitHub (patches welcome).

AUTHOR

Flavio Poletti <polettix@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2016 by Flavio Poletti <polettix@cpan.org>

This module is free software. You can redistribute it and/or modify it under the terms of the Artistic License 2.0.

This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.