The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Data::RuledValidator - data validator with rule

DESCRIPTION

Data::RuledValidator is validator of data. This needs rule which is readable by not programmer ... so it is like specification.

WHAT FOR ?

One programmer said;

 specification is in code, so documentation is not needed.

Another programmer said;

 code is specification, so if I write specification, it is against DRY.

It is excuse of them. They may dislike to write documents, they may be not good at writing documents, and/or they may think validation check is trivial task. But, if specification is used by programming and we needn't write program, they will start to write specification. And, at last, we need specification.

SYNOPSIS

You can use this without rule file.

 BEGIN{
   $ENV{REQUEST_METHOD} = "GET";
   $ENV{QUERY_STRING} = "page=index&i=9&k=aaaaa&v=bbbb";
 }
 
 use Data::RuledValidator;

 use CGI;
 
 my $v = Data::RuledValidator->new(obj => CGI->new, method => "param");
 print $v->by_sentence("age is num", "name is word", "nickname is word", "required = age,name,nickname");  # return 1 if valid

This means that parameter of CGI object, age is number, name is word, nickname is also word and require age, name and nickname.

Next example is using following rule in file "validator.rule";

 ;;GLOBAL

 ID_KEY  page
 
 # $cgi->param('age') is num
 age      is num
 # $cgi->param('name') is word
 name     is word
 # $cgi->param('nickname') is word
 nickname is word
 
 # following rule is applyed when $cgi->param('page') is 'index'
 ;;index
 # requied $cgi->param('age'), $cgi->param('name') and $cgi->param('nickname')
 required = age, name, nickname

And code is(environmental values are as same as first example):

 my $v = Data::RuledValidator->new(obj => CGI->new, method => "param", rule => "validator.rule");
 print $v->by_rule; # return 1 if valid

This is as nearly same as first example. left value of ID_KEY, "page" is parameter name to specify rule name to use.

 my $q = CGI->new;
 $id = $q->param("page");

Now, $id is "index" (see above environmental values in BEGIN block), use rule in "index". The specified module and method in new is used. "index" rule is following:

 ;;index
 required = age, name, nickname

Global rule is applied as well.

 age      is num
 name     is word
 nickname is word

So it is as same as first example. This means that parameter of CGI object, age is number, name is word, nickname is also word and require age, name and nickname.

RuledValidator GENERAL IDEA

  • Object

    Object has data which you want to check and Object has Method which returns Value(s) from Object's data.

  • Key

    Basically, Key is the key which is passed to Object Method.

  • Value(s)

    Value(s) are the returned of the Object Method passed Key.

  • Operator

    Operator is operator to check Value(s).

  • Condition

    Condition is the condition for Operator to judge whether Value(s) is/are valid or not.

USING OPTION

When using Data::RuledValidator, you can use option.

import_error

This defines behavior when plugin is not imported correctly.

 use Data::RuledValidator import_error => 0;

If value is 0, do nothing. It is default.

 use Data::RuledValidator import_error => 1;

If value is 1, warn.

 use Data::RuledValidator import_error => 2;

If value is 2, die.

plugin

You can specify which plugins you want to load.

 use Data::RuledValdiator plugin => [qw/Email/];

If you don't specify any plugins, all plugins will be loaded.

filter

You can specify which filter plugins you want to load.

 use Data::RuledValdiator filter => [qw/XXX/];

If you don't specify any filter plugins, all filter plugins will be loaded.

CONSTRUCTOR

new
 my $v = Data::RuledValidator->new(
                obj    => $obj,
                method => $method,
                rule   => $rule_file_location,
          );

$obj is Object which has values which you want to check. $method is Method of $obj which returns Value(s) which you want to check. $rule_file_location is file location of rule file.

 my $v = Data::RuledValidator->new(obj => $obj, method => $method);

If you use "by_sentence" and/or you use "by_rule" with argument, no need to specify rule here.

You can use array ref for method. for example, $c is object, and $c->res->param is the way to get values. pass [qw/res param/] to method.

If you need another object and/or method for identify to group name.

 my $v = Data::RuledValidator->new(obj => $obj, method => $method, id_obj => $id_obj, id_method => $id_method);

for validation, $obj->$method is used. for identifying to group name, $id_obj->$id_method is used (when you omit id_method, method is used).

CONSTRUCTOR OPTION

rule
 rule => rule_file_location

explained above.

filter_replace

Data::RuledValidator has filter feature. You can decide replace object method value with filtered value or not.

This option can take 3 kind of value.

 filter_replace => 0

This will not use filtered value.

 filter_replace => 1
 filter_replace => []

Use filtered value. Using 1 or [] is depends on the way to set value with object method.

 1  ...  $q->param(key, @value);
 [] ...  $q->param(key, [ @value ]);
rule_path
 rule_path => '/path/to/rule_dir/'

You can specify the path of the directory including rule files.

auto_reset

By default, reset method is automatically called when by_rule or by_sentence is called.

If you want to change this behavior, set it.

 auto_reset => 0

You can change the value by method auto_reset.

key_method
 key_method => 'param'

key_method is the method of obj which returns keys like as param of CGI module. If you don't specify this value, the value you specified as method is used. if you want to disable this, set 0 or empty value as following.

 key_method => 0
 key_method => ''

This is for "filter * with ..." sentence in "FILTERS" and when filter_replace is true, this filter sentence apply filter all values of keys which are returned by key_method. When you disable this(you set key_method => 0), the values applyed filter are only keys which are in rule.

METHOD for VALIDATION

by_sentence
 $v->by_sentence("i is number", "k is word", ...);

The arguments is rule. You can write multiple sentence. It returns $v object.

by_rule
 $v->by_rule();
 $v->by_rule($rule_file);
 $v->by_rule($rule_file, $group_name);

If $rule is omitted, using the file which is specified in new. It returns $v object.

result
 $v->result;

The result of validation check. This returned the following structure.

 {
   'i_is'    => 0,
   'v_is'    => 0,
   'z_match' => 1,
 }

This means

 key 'i' is invalid.
 key 'v' is invalid.
 key 'z' is valid.

You can get this result as following:

 %result = @$v;
valid
 $v->valid;

The result of total validation check. The returned value is 1 or 0.

You can get this result as following, too:

 $result = $v;
failure
 $v->failure;

Given values to validation check. Some/All of them are wrong value. This returned, for example, the following structure.

 {
   'i_is'    => ['x', 'y', 'z'],
   'v_is'    => ['x@x.jp'],
   'z_match' => [0123, 1234],
 }

If you want wrong value only, use wrong method.

missing

The values included in rule is not given from object. You can get such keys/aliases as following

 my $missing_arrayref = $v->missing;

$missing_arrayref likes as following;

 ['key', 'alias']
wrong

This is not implemented.

 $v->wrong;

It returns only wrong value.

 {
   'i_is'    => ['x', 'y', 'z'],
   'v_is'    => ['x@x.jp'],
   'z_match' => [0123, 1234],
 }

All of them are wrong values.

reset
 $v->reset();

The result of validation check is reseted. This is internally called when by_sentence or by_rule is called.

OTHER METHOD

required_alias_name
 $v->required_alias_name

It is special alias name to specify required keys.

list_plugins
 $v->list_plugins;

list all plugins.

filter_replace
 $v->filter_replace;

This get/set new's option filter_replace. get/set value is 0, 1 or [].

See "CONSTRUCTOR OPTION".

rule_path
 $v->rule_path

This get/set new's option rule_path.

See "CONSTRUCTOR OPTION".

auto_reset
 $v->auto_reset;

This get/set new's option auto_reset. get/set value is 0, 1.

See "CONSTRUCTOR OPTION".

RULE SYNTAX

Rule Syntax is very simple.

ID_KEY Key

The right value is key which is passed to Object->Method. The returned value of Object->Method(Key) is used to identify GROUP_NAME

 ID_KEY page
ID_METHOD method, method ...

Note that: It is used, only when you need another method to identify to GROUP_NAME.

The right value is method which is used when Object->Method. The returned value of Object->Method(Key)/Object->Method (Key is omitted) is used to identify GROUP_NAME.

 ID_METHOD request action

This can be defined in constructor, new.

;GROUP_NAME

start from ; is start of group and the end of this group is the line before next ';'. If the value of Object->Method(ID_KEY) is equal GROUP_NAME, this group validation rule is used.

 ;index

You can write as following.

 ;;;;index

You can repeat ';' any times.

;r;^GROUP_NAME$

This is start of group, too. If the value of Object->Method(ID_KEY) is match regexp ^GROUP_NAME$, this group validation rule is used.

 ;r;^.*_confirm$

You can write as following.

 ;;r;;^.*_confirm$

You can repeat ';' any times.

;path;/path/to/where

It is as same as ;r;^/path/to/where/?$.

Note that: this is needed that ID_KEY is 'ENV_PATH_INFO'.

You can write as following.

;;path;;/path/to/where

You can repeat ';' any times.

;GLOBAL

This is start of group, too. but 'GLOBAL' is special name. The rule in this group is inherited by all group.

 ;GLOBAL
 
 i is number
 w is word

If you write global rule on the top of rule. no need specify ;GLOBAL, they are parsed as GLOBAL.

 # The top of file
 
 i is number
 w is word

They will be regarded as global rule.

#

start from # is comment.

 # This is comment
sentence
 i is number

sentence has 3 parts, at least.

 Key Operator Condition

In example, 'i' is Key, 'is' is Operator and 'number' is Condition.

This means:

 return $obj->$method('i') =~/^\d+$/ + 0;

In some case, Operator can take multiple Condition. It is depends on Operator implementation.

For example, Operator 'match' can multiple Condition.

 i match ^[a-z]+$,^[0-9]+$

When i is match former or later, it is valid.

Note that:

You CANNOT use same key with same operator.

 i is number
 i is word
alias = sentence

sentence is as same as above. 'alias =' effects result data structure.

First example is normal version.

Rule:

 i is number
 p is word
 z match ^\d{3}$

Result Data Structure:

 {
   'i_is'    => 0,
   'p_is'    => 0,
   'z_match' => 1,
 }

Next example is version using alias.

 id       = i is number
 password = p is word
 zip      = z match ^\d{3}$

Result Data Structure:

 {
   'id_is'        => 0,
   'password_is'  => 0,
   'zip_match'    => 1,
 }
Special alias name for required values
 required = name, id, password

This alias name "required" is special name and syntax after the name, is special a bit.

This sentence means these keys/aliases, name, id and password are required.

You can change the name "required" by required_alias_name method.

Note that: You cannot write key name if you use alias and don't use the key name elsewhere.

for example;

 foo is alpha
 alias = var is 'value'
 
 # It doesn't work correctly because alias is used instead of key name 'var'
 required = foo, var

You should write as following;

 foo is alpha
 alias = var is 'value'
 
 # It works correctly because alias is used
 required = foo, alias

But the following works correctly;

 foo is alpha
 alias = foo eq 'value'
 
 # It works correctly because key name 'foo' is used elsewhere.
 required = foo
Override Global Rule

You can override global rule.

 ;GLOBAL
 
 ID_KEY page
 
 i is number
 w is word
 
 ;index
 
 i is word
 w is number

If you want delete some rules in GLOBAL in 'index' group.

 ;index
 
 w is n/a
 w match ^[e-z]+$

If you want delete all GLOBAL rule in 'index' group.

 ;index

 GLOBAL is n/a

FILTERS

Data::RuledValidator has filtering feature. There are two ways how to filter values.

filter Key, ... with FilterName, ...
 filter tel_number with no_dash
 tel_number is num
 tel_number length 9

This declaration is no relation with location. So, following is as same mean as above.

 tel_number is num
 tel_number length 9
 filter tel_number with no_dash

Filter is also inherited from GLOBAL. If you want to ignore GLOBAL filter, do as following;

 filter tel_number with n/a

If you want to ignore GLOBAL filter on all keys, do as following; (not yet implemented)

 filter * with n/a
Keys Operator Condition with FilterName, ...

This is temporary filter.

 tel1 = tel_number is num with no_dash
 tel2 = tel_number is num

tel1's tel_number is filtered tel_number, but tel2's tel_number is not filtered.

But in following case, tel2 is filtered, too.

 filter tel_number with no_dash
 tel1 = tel_number is num with no_dash
 tel2 = tel_number is num

If you want ignore "filter tel_number with no_dash", use no_filter in temporary filter.

 filter tel_number with no_dash
 tel1 = tel_number is num with no_filter
 tel2 = tel_number is num

If temporary filter is defined, it is prior to "filter ... with ...".

See also Data::RuledValidator::Filter

OPERATORS

is
 key is mail
 key is word
 key is num

'is' is something special operator. It can be to be unavailable GLOBAL at all or about some key.

 ;;GLOBAL
 i is num
 k is value

 ;;index
 v is word

in this rule, 'index' inherits GLOBAL. If you want not to use GLOBAL.

 ;;index
 GLOBAL is n/a
 v is word

if you want not to use key 'k' in index.

 ;;index
 k is n/a
 v is word

This inherits 'i', but doesn't inherit 'k'.

isnt

It is the opposite of 'is'. but, no use to use 'n/a' in condition.

of

This is some different from others. Left word is not key. number or 'all' and this needs alias.

 all = all of x,y,z

This is needed all of keys x, y and z. It is no need for these value of keys to be valid. If this key exists, it is OK.

If you need only 2 of these keys. you can write;

 2ofxyz = 2 of x,y,z

This is needed 2 of keys x, y or z.

If you want valid values, use of-valid instead of valid.

of-valid

This likes 'of'.

 all = all of-valid x,y,z

This is needed all of keys x, y and z. It is needed for these value of keys to be valid.

If you need only 2 of these keys. you can write;

 2ofvalidxyz = 2 of-valid x,y,z

This is needed 2 of keys x, y or z.

If you want valid values, use of-valid instead of 'of'.

in

If value is in the words, it is OK.

 key in Perl, Python, Ruby, PHP ...

This is "or" condition. If value is equal to one of them, it is OK.

match

This is regular expression.

 key match ^[a-z]{2}\d{5}$

If you want multiple regular expression.

 key match ^[a-z]{2}\d{5}$, ^\d{5}[a-z]{1}\d{5}$, ...

This is "or" condition. If value is match one of them, it is OK.

re

It is as same as 'match'.

has
 key has 3

This means key has 3 values.

If you want less than the number or grater than the number. You can write;

 key has < 4
 key has > 4
eq (= equal)
 key eq STRING

If key's value is as same as STRING, it is valid.

You can use special string like following.

 key eq [key_name]
 key eq {data_key_name}

[key_name] is result of $obj->$method(key_name). For the case which user have to input password twice, you can write following rule.

 password eq [password2]

This rule means, for example;

 $cgi->param('password') eq $cgi->param('password2');

{data_key_name} is result of $data->{data_key_name}. For the case when you should check data from database.

 my $db_data = ....;
 if($cgi->param('key') ne $db_data){
    # wrong!
 }

In such a case, you can write as following.

rule;

 key eq {db_data}

code;

 my $db_data = ...;
 $v->by_rule({db_data => $db_data});
ne (= not_equal)
 key ne STRING

If key's value is NOT as same as STRING, it is valid. You can use special string like "eq" in above explanation.

length #,#
 words length 0, 10

If the length of words is from 0 to 10, it is valid. The first number is min length, and the second number is max length.

You can write only one value.

 words length 5

This means the length of words is lesser than 6.

Note that: use it instead of '>= ~ #', '<= ~ #' and 'between ~ #, #'.

>, >=
 key > 4

If key's value is greater than number 4, it is valid. You can use '>=', too.

If you want to check length of the value, put '~' before number as following.

 key > ~ 4

Note that: use length, instead of '>= ~ #'.

<, <=
 key < 5

If key's value is less than number 5, it is valid. You can use '<=', too.

If you want to check length of the value, put '~' before number as following.

 key < ~ 4

Note that: use length, instead of '<= ~ #'.

between #,#
 key between 3,5

If key's value is in the range, it is valid.

If you want to check length of the value, put '~' before number as following.

 key between ~ 4,10

Note that: use length, instead of 'between ~ #, #'.

HOW TO ADD OPERATOR

This module has 2 kinds of operator.

normal operator

This is used in sentence.

 Key Operator Condition
     ~~~~~~~~
For example: is, are, match ...

"v is word" returns structure like a following:

 {
   v_is => 1,
   v_valid => 1,
 }
condition operator

This is used in sentence only when Operator is 'is/are/isnt/arent'.

 Key Operator Condition
     (is/are) ~~~~~~~~~
   (isnt/arent)

This is operator which is used for checking Value(s). Operator should be 'is' or 'are'(these are same) or 'isnt or arent'(these are same).

For example: num, alpha, alphanum, word ...

You can add these operator with 2 class method.

add_operator
 Data::RuledValidator->add_operator(name => $code);

$code should return code to make closure. For example:

 Data::RuledValidaotr->add_operator(
   'is'     =>
   sub {
     my($key, $c) = @_;
     my $sub = Data::RuledValidaotr->_cond_op($c) || '';
     unless($sub){
       if($c eq 'n/a'){
         return $c;
       }else{
         Carp::croak("$c is not defined. you can use; " . join ", ", Data::RuledValidaotr->_cond_op);
       }
     }
     return sub {my($self, $v) = @_; $v = shift @$v; return($sub->($self, $v) + 0)};
   },
 )

$key and $c is Key and Condition. They are given to $code. $code receive them and use them as $code likes. In example, get code ref to use $c(Data::RuledValidaotr->_cond_op($c)).

 return sub {my($self, $v) = @_; $v = shift @$v; return($sub->($self, $v) + 0)};

This is the code to return closure. To closure, 5 values are given.

 $self, $values, $alias, $obj, $method

 $self   = Data::RuledValidaotr object
 $values = Value(s). array ref
 $alias  = alias of Key
 $obj    = object given in new
 $method = method given in new

In example, first 2 values is used.

add_condition
 Data::RuledValidator->add_condition(name => $code);

$code should be code ref. For example:

__PACKAGE__->add_condition ( 'mail' => sub{my($self, $v) = @_; return Email::Valid->address($v) ? 1 : 0}, );

PLUGIN

Data::RuledValidator is made with plugins (since version 0.02).

How to create plugins

It's very easy. The name of the modules plugged in this is started from 'Data::RuledValidator::Plugin::'.

for example:

 package Data::RuledValidator::Plugin::Email;
 
 use Email::Valid;
 use Email::Valid::Loose;
 
 Data::RuledValidator->add_condition
   (
    'mail' =>
    sub{
      my($self, $v) = @_;
      return Email::Valid->address($v) ? 1 : ()
    },
    'mail_loose' =>
    sub{
      my($self, $v) = @_;
      return Email::Valid::Loose->address($v) ? 1 : ()
    },
   );
 
 1;

That's all. If you want to add normal_operator, use add_operator Class method.

OVERLOADING

 $valid = $validator_object;  # it is as same as $validator_object->valid;
 %valid = @$validator_object; # it is as same as %{$validator_object->result};

INTERNAL CLASS DATA

It is just a memo.

%RULE

All rule for all object(which has different rule file).

structure:

 rule_name =>
  {
    _regex_group      => [],
    # For group name, regexp can be used, for no need to find rule key is regexp or not,
    # This exists.
    id_key           => [],
    # Rule has key which identify group name. this hash is {RULE_NAME => key_name}
    # why array ref?
    # for unique, we can set several key for id_key(it likes SQL unique)
    coded_rule       => [],
    # it is assemble of closure
    time             => $time
    # (stat 'rule_file')[9]
  }
%COND_OP

The keys are condition operator names. The values is coderef(condition operator).

%MK_CLOSURE
 { operator => sub{coderef which create closure} }
%REQUIRED
 { required_key => undef, required_key2 => undef }

NOTE

Now, once rule is parsed, rule is change to code (assemble of closure) and it is stored as class data.

If you use this for CGI, performance is not good. If you use this on mod_perl, it is good idea.

I have some solution;

store code to storable file. store code to shared memory.

TODO

can take 2 keys for id_key
More test

I have to do more test.

More documents

I have to write more documents.

multiple rule files

AUTHOR

Ktat, <ktat@cpan.org>

COPYRIGHT

Copyright 2006-2007 by Ktat

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

See http://www.perl.com/perl/misc/Artistic.html