The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Data::Validate::Perl - validates in-memory perl data using a specification

SYNOPSIS

Continue reading only when you want to generate Parse::Yapp grammar from the specification file and patch it, or understand how it works internally, else please look into dvp_gen_parser command-line utility documentation instead.

    use Data::Validate::Perl qw/gen_yp_rules/;

    my $yapp_grammar = gen_yp_rules($spec_file);

EXPORTS

gen_yp_rules

SUBROUTINES

gen_yp_rules

This function contains the main logic to parse the data specification and translate it to Parse::Yapp grammar. Returns the grammar string on success.

DESCRIPTION

In order to understand internal of this module, working knowledge of parsing, especially Yacc is required. Stop and grab a book on topic if you are unsure what this is.

A common parsing mechanism applies state machine onto a string, such as regular expression. This part is easy to follow. In this module a Yacc state machine is used, the target is not plain text but a in-memory data structure - a tree made up by several perl scalar/array/hash items.

The process to validate a data structure like that is a tree traversal. The biggest challenge is how to put these 2 things together.

The best way to figure a solution is, imagine each step to perform a depth-first iteration on a tree. Each move can be abstracted as a 'token'. This is the key idea behind.

To elaborate, think how to validate a simple perl hash like below:

   my %hash = (key1 => value1, key2 => value2, key3 => value3);

To iterate the hash key/value pairs, use a cursor to describe the following states:

   1. initial state: place the cursor onto hash itself;
   2. 1st state: move cursor to key1;
   3. 2nd state: move cursor to value1;
   4. 3rd state: move cursor to key2;
   5. 4th state: move cursor to value2;
   6. 5th state: move cursor to key3;
   7. 6th state: move cursor to value3;

A draft Yacc grammar written as:

   root_of_hash: key1 value1 | key2 value2 | key3 value3

The state machine needs token to decide which sub-rule to walk into. Looking onto the key1/2/3, the corresponding token can simply be the value of themselves. That is:

   root_of_hash: 'key1' value1 | 'key2' value2 | 'key3' value3

Note the quotes, they mark key1/2/3 as tokens. Next move to the hash value. When the cursor points to a value, I do not care about the actual value, instead I just want to hint the state machine that it is a value. It requires another token to accept the state. How about a plain text token - 'TEXT'. Finally the grammar to be:

   root_of_hash: 'key1' 'TEXT' | 'key2' 'TEXT' | 'key3' 'TEXT'

How to apply the generated state machine to the hash validation then? Each time the parser cannot determine which is next state, it asks the lexer for a token. The simplest form of a lexer is just a function to return the corresponding tokens for each state. At this point, you might be able to guess how it works:

   1. state machine initialized, it wants to move to next state, so it asks lexer;
   2. the lexer holds the hash itself, it calls keys function, returns the first key as token, put the key returned into its memory;
   3. by the time state machine got key1, it moves the cursor onto 'key1', then asks lexer again;
   4. the lexer checks its memory and figures it returned 'key1' just now, time to return its vlaue, as the state machine has no interest on the actual value, it returns 'TEXT';
   5. state machine finished the iteration of key1/value1 pair, asks for another token;
   6. lexer returns 'key2' and keeps it in its own memory;
   7. state machine steps into the sub-rule 'key2' 'TEXT';
   ...

The state loop is fairly straightforward. Parsing isn't that difficult, huh :-)

To iterate a nested tree full of scalar/array/hash, other tokens are introduced:

   1. '[' ']' indicates start/end state of array traversal;
   2. '{' '}' indicates start/end state of hash traversal;
   3. to meet special need, certain rule actions are defined to set some state flags, which influence the decision that the lexer returns the value as 'TEXT', or the actual value string itself;

The state maintenance in lexer is made up by a stack, the stack simulates a depth-first traversal:

   1. when meets array, iterates array items one by one, if any item is another array or hash, push current array onto the stack together with an index marking where we are in this array. Iterates that item recursively;
   2. similar strategy is applied to hash;

The left piece is a DSL to describe the tree structure. By the time you read here, I am fairly confident you are able to figure it out yourself by exercising various pieces of this module, below is a small leaf-note:

   1. gen_yp_rules function handles translation from data structure spec to corresponding Yacc grammar;
   2. bottom section of this module contains the Lexer function and other routines L<Parse::Yapp> requires to work (browse the module source to read);
   3. the command-line utility C<dvp_gen_parser> reads the spec file, calls gen_yp_rules to generate grammar, fits it into a file and calls C<yapp> to create the parser module;

Wish you like this little article and enjoy playing with this module.

SEE ALSO

   * L<Parse::Yapp>

AUTHOR

Dongxu Ma, <dongxu at cpan.org>

BUGS

Please report any bugs or feature requests to bug-data-validate-perl at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Data-Validate-Perl. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Data::Validate::Perl

You can also look for information at:

LICENSE AND COPYRIGHT

Copyright 2014 Dongxu Ma.

This program is free software; you can redistribute it and/or modify it under the terms of the the Artistic License (2.0). You may obtain a copy of the full license at:

http://www.perlfoundation.org/artistic_license_2_0

Any use, modification, and distribution of the Standard or Modified Versions is governed by this Artistic License. By using, modifying or distributing the Package, you accept this license. Do not use, modify, or distribute the Package, if you do not accept this license.

If your Modified Version has been derived from a Modified Version made by someone other than you, you are nevertheless required to ensure that your Modified Version complies with the requirements of this license.

This license does not grant you the right to use any trademark, service mark, tradename, or logo of the Copyright Holder.

This license includes the non-exclusive, worldwide, free-of-charge patent license to make, have made, use, offer to sell, sell, import and otherwise transfer the Package with respect to any patent claims licensable by the Copyright Holder that are necessarily infringed by the Package. If you institute patent litigation (including a cross-claim or counterclaim) against any party alleging that the Package constitutes direct or contributory patent infringement, then this Artistic License to you shall terminate on the date that such litigation is filed.

Disclaimer of Warranty: THE PACKAGE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS "AS IS' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES. THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT ARE DISCLAIMED TO THE EXTENT PERMITTED BY YOUR LOCAL LAW. UNLESS REQUIRED BY LAW, NO COPYRIGHT HOLDER OR CONTRIBUTOR WILL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THE PACKAGE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.