The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Data::Schema - Validate nested data structures with nested structure

VERSION

Version 0.11

SYNOPSIS

    # OO interface
    use Data::Schema;
    my $validator = Data::Schema->new();
    my $schema = [array => {min_len=>2, max_len=>4}];
    my $data = [1, 2, 3];
    my $res = $validator->validate($data, $schema);
    print "valid!" if $res->{success}; # prints 'valid!'

    # procedural interface
    use Data::Schema;
    my $sch = ["hash",
               {keys =>
                    {name => "str",
                     age  => ["int", {required=>1, min=>18}]
                    }
                }
              ];
    my $r;
    $r = ds_validate({name=>"Lucy", age=>18}, $sch); # success
    $r = ds_validate({name=>"Lucy"         }, $sch); # fail: missing age
    $r = ds_validate({name=>"Lucy", age=>16}, $sch); # fail: underage

DESCRIPTION

Data::Schema (DS) is a schema system for data validation. It lets you write schemas as data structures, ranging from very simple (a scalar) to complex (nested hashes/arrays) depending on how complex you want your validation to be.

Writing schemas as data structures themselves has several advantages. First, it is more portable across languages (e.g. using YAML to share schemas between Perl, Python, PHP, Ruby). Second, you can validate the schema using the schema system itself. Third, it is easy to generate code, help message (e.g. so-called "usage" for function/command line script), etc. from the schema.

Potential application of DS: validating configuration, function parameters, command line arguments, etc.

To get started, see Data::Schema::Manual::Tutorial.

FUNCTIONS

ds_validate($data, $schema)

Non-OO wrapper for validate(). Exported by default. See validate() method.

ATTRIBUTES

config

Configuration object. See Data::Schema::Config.

METHODS

merge_attr_hashes($attr_hashes)

Merge several attribute hashes if there are hashes that can be merged (i.e. contains merge prefix in its keys). Used by DST::Base and DST::Schema. As DS user, normally you wouldn't need this.

init_validation_state()

Initialize validation state. Used internally by validate(). As DS user, normally you wouldn't need this.

save_validation_state()

Save validation state (position in data, position in schema, number of errors, etc) into a stack, so that you can start using the validator to validate a new data with a new schema, even in the middle of validating another data/schema. Used internally by validate() and DST::Schema. As DS user, normally you wouldn't need this.

See also: restore_validation_state().

restore_validation_state()

Restore the last validation state into a stack. Used internally by validate() and DST::Schema. As DS user, normally you wouldn't need this.

See also: save_validation_state().

log_error($message)

Add an error when in validation process. Will not add if there are already too many errors (too_many_errors attribute is true). Used by type handlers. As DS user, normally you wouldn't need this.

log_warning($message)

Add a warning when in validation process. Will not add if there are already too many warnings (too_many_warnings attribute is true). Used by type handlers. As DS user, normally you wouldn't need this.

check_type_name($name)

Checks whether $name is a valid type name. Returns true if valid, false if invalid. By default it requires that type name starts with a lowercase letter and contains only lowercase letters, numbers, and underscores. Maximum length is 64.

You can override this method if you want stricter/looser type name criteria.

register_type($name, $class|$obj)

Register a new type, along with a class name ($class) or the actual object ($obj) to handle the type. If $class is given, the class will be require'd and instantiated to become object later when needed via get_type_handler.

Any object can become a type handler, as long as it has:

  • a validator() rw property to store/set validator object;

  • handle_type() method to handle type checking;

  • zero or more handle_attr_*() methods to handle attribute checking.

See Data::Schema::Manual::TypeHandler for more details on writing a type handler.

register_plugin($class|$obj)

Register a new plugin. Accept a plugin object or class. If $class is given, the class will be require'd (if not already require'd) and instantiated to become object.

Any object can become a plugin, you don't need to subclass from anything, as long as it has:

  • a validator() rw property to store/set validator object;

  • zero or more handle_*() methods to handle some events/hooks.

See Data::Schema::Manual::Plugin for more details on writing a plugin.

call_handler($name, [@args])

Try handle_*() method from each registered plugin until one returns 0 or 1. If a plugin return -1 (decline) then we continue to the next plugin. Returns the status of the last plugin. Returns -1 if there's no handler to invoke.

get_type_handler($name)

Try to get type handler for a certain type. If type handler is not an object (a class name), instantiate it first. If type is not found, invoke handle_unknown_type() in plugins to give plugins a chance to load the type. If type is still not found, return undef.

normalize_schema($schema)

Normalize a schema into the third form (hash form) ({type=>..., attr_hashes=>..., def=>...) as well as do some sanity checks on it. Returns an error message string if fails.

register_schema_as_type($schema, $name)

Register schema as new type. $schema is a normalized schema. Return {success=>(0 or 1), error=>...}. Fails if type with name $name is already defined, or if $schema cannot be parsed. Might actually register more than one type actually, if the schema contains other types in it (hash form of schema can define types).

validate($data[, $schema])

Validate a data structure. $schema must be given unless you already give the schema via the schema attribute.

Returns {success=>0 or 1, errors=>[...], warnings=>[...]}. The 'success' key will be set to 1 if the data validates, otherwise 'errors' and 'warnings' will be filled with the details.

errors_as_array

Return formatted errors in an array of strings.

warnings_as_array

Return formatted warnings in an array of strings.

COMPARISON WITH OTHER DATA VALIDATION MODULES

There are already a lot of data validation modules on CPAN. However, most of them do not validate nested data structures. Many seem to focus only on "form" (which is usually presented as shallow hash in Perl).

And of the rest which do nested data validation, either I am not really fond of the syntax, or the validator/schema system is not simple/flexible/etc enough for my taste. For example, other data validation modules might require you to always write:

 { type => "int" }

even when all you want is just validating an int with no other extra requirements. With DS you can just write:

 "int"

Another design consideration for DS is, I want to maximize reusability of my schemas. And thus DS allows you to define schemas in terms of other schemas. External schemas can be "require"-d from Perl variables or loaded from YAML files.

DS is still in its early phase of development, but I am already starting to use it in production. I am quite content with the current syntax, but that doesn't mean it won't change in the future. DS can already do decent validation, there are already several basic types each with a decent set of attributes. But some "standard" stuffs present in other modules are still absent in DS: handling of default values and filters. These will be added in future releases along with other planned features like variable substitution, etc.

PERFORMANCE NOTES

The way the code is written & structured (e.g. it uses Moose, validation involves a relatively high number of method calls, etc.) it is probably slower than other data validation modules. However, at the moment the code has not been profiled and optimized.

To give a rough picture, here's how DS 0.03 fares on my Athlon 64 X2 5000+ (which I think is still a fairly decent box in 2009). Perl 5.10.0, Moose 0.72.

1. Using the simplest case:

 $validator->validate(1, "int")

the speed is around 14,000 validations per second.

2. Using the dice throws example (see DSM::Tutorial):

 $validator->validate([1,2,3,4,5,6,[1,1],[1,2],[1,3],[1,4]], $schema)

the speed is around 150/sec.

3. Using the dice throws example, but moving all subschemas to a hash and using DSP::LoadSchema::Hash to load it, the speed is around 190/sec.

4. Using a fairly complex schema, XXX.

With this kind of performance you might want to reconsider using DS inside functions that are called very frequently (like hundreds or thousands of times per second). But I think DS should be fine for CGI applications or for command line argument checking and I will not be focusing on performance for the time being.

Some tips on performance:

1. move subschemas out;

2. keep schema simple;

3. write heavy-duty validation logic in Perl (e.g. using new type handler and/or type attribute).

SEE ALSO

Data::Schema::Manual::Tutorial, Data::Schema::Manual::Schema, Data::Schema::Manual::TypeHandler, Data::Schema::Manual::Plugin

Some other data validation modules on CPAN: Data::FormValidator, Data::Rx, Kwalify.

AUTHOR

Steven Haryanto, <steven at masterweb.net>

BUGS

Please report any bugs or feature requests to bug-data-schema at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Data-Schema. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Data::Schema

You can also look for information at:

ACKNOWLEDGEMENTS

COPYRIGHT & LICENSE

Copyright 2009 Steven Haryanto, all rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.