The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Data::Type - robust and extensible data- and valuetype system

SYNOPSIS

  use Data::Type qw(:is +ALL);

  is STD::EMAIL or warn;

  warn if isnt STD::CREDITCARD( 'MASTERCARD', 'VISA' );

  try
  {
    valid( '9999-12-31 23:59:59', DB::DATETIME );
  }
  catch Data::Type::Exception with
  {
    print $e->to_string foreach @_;
  };

DESCRIPTION

A lot of CPAN modules have a common purpose: reporting if data has some "characteristics". Email::Valid is an illustrous example: reporting if a string has characteristics of an email address. The address() method reports this via returning 'yes' or 'no'. Another module, another behaviour: Business::ISSN tests for the characteristics of an International Standard Serial Number and does this via an is_valid method returning true or false. And so on and so on. Data::Type was created with modularity, introspectability and usability in mind.

The resulting key concepts are:

This module relies, as much as its plausible, on CPAN modules doing the job in the backend. For instance Regexp::Common is doing a lot of the regular expression testing. Email::Valid takes care of the EMAIL type. Data::Parse can be exploited for doing the backwork for the DATE type.

DOCUMENTATION

You find a gentle introduction at Data::Type::Docs. It also navigates you through the rest of the documentation. Advanced users should keep on reading here.

SUPPORTED TYPES

All types are grouped and though belong to a collection. The collection is identified by a short id. All members are living in a namespace that is prefixed with it (uppercased).

Standard Collection ('STD')

This is a heterogenous collection of datatypes which is loaded by default. It contains various issues from CPAN modules (i.e. business, creditcard, email, markup, regexps and etc.) and some everyday things. See Data::Type::Collection::Std.

W3C/XML-Schema Collection ('W3C')

A nearly 1-to-1 use of XML::Schema datatypes. It is nearly complete and works off the shelf. Please visit the XMLSchema http://www.w3.org/TR/xmlschema-2/ homepage for sophisticated documentation. See Data::Type::Collection::W3C.

Database Collection ('DB')

Common database table types (VARCHAR, TINYTEXT, TIMESTAMP, etc.). See Data::Type::Collection::DB.

Biological Collection ('BIO')

Everything that is related to biological matters (DNA, RNA, etc.). See Data::Type::Collection::Bio.

Chemistry Collection ('CHEM')

Everything that is related to chemical matters (Atoms, etc.). See Data::Type::Collection::Chem.

Perl5 Collection ('PERL')

Reserved and undecided. See Data::Type::Collection::Perl.

Perl6 Apocalypse Collection ('PERL6')

Placeholder for the Apocalypse and Synopsis 6 suggested datatypes for perl6. See Data::Type::Collection::Perl6.

[Note] ALL is a an alias for all available collections at once.

[NOTE] Please consider the same constrains as for CPAN namespaces when using/suggesting a new ID. A short discussion on the http://sf.net/projects/datatype mailinglist is rewarded with gratefullness and respect.

API

FUNCTIONS

valid( $value, @types )

This function throws a Data::Type::Exception exception on failure.

Verifies a 'value' against (one ore more) types or facets.

  try
  {
    valid( 'muenalan<haaar..harr>cpan.org', STD::EMAIL );
  }
  catch Data::Type::Exception with
  {
    dump( $e ) foreach @_;
  };

is( $type )

  $scalar = is( $value, $type );
  $scalar = is( $type );            # $_ is used as $value

Returns true or false instead of throwing exceptions. This is for the exception haters. For reporting, the exceptions are stored in $Data::Type::err aref.

  is( 'muenalan<haaar..harr>cpan.org', STD::EMAIL ) or die dump($Data::Type::err);

[Note] dump() is part of Data::Dump. You can use any dumping routine or format a string with printf, of course.

If first argument is a $dt it uses $_ instead of $value. This is for syntactic sugar like:

  foreach( @nucleotide_samples )
  {
    email_to( $SETI ) unless is BIO::DNA;      # Sends "Non terrestric genome found. Suspected sequence '$_'.
  }

[Note] Dont take that example to serious. It also could have been simple RNA. Better would have been unless is (BIO::DNA, BIO::RNA).

isnt( $type )

  $scalar = isnt( $value, $type );
  $scalar = isnt( $type );          # $_ is used as $value

A negation of "is( $type )", or better an idiom for "not is". These are all semantical identical constructs:

   die if isnt STD::EMAIL;

   die if not is STD::EMAIL;

   die unless is STD::EMAIL;

[Note] die if is not STD::EMAIL would be wrong (even if it is the most natural form). STD::EMAIL is not a package, but the FUNCTION STD::EMAIL() function. So a less ambigous form would be

 die unless is STD::EMAIL();

because it cautions one not to confuse package vs. function names.

summary( $value, @types )

  $scalar = summary( $value, @types );
  @entries = summary( $value, @types );    # list context

In scalar context returns the textual representation of the facet set. Gives you a clou how the type verification process is driven. You can use that to prompt the web user to correct invalid form fields.

 print summary( $cc , STD::CREDITCARD( 'VISA' ) );

[Note] A real $dt->test is employed to collect the required information. Therefore the $value arguement is required, because it dictates the executed code.

In list context summary returns an array of Data::Type::Entry objects.

 print $_->expected for summary( $cc , STD::CREDITCARD( 'VISA' ) );

CLASS METHODS

The method interface is thoroughly described in Data::Type::Docs::RFC.

Data::Type->set_locale( 'id' )

If there is an implemented locale package under Data::Type::L18N::<id>, then you can switch to that language with this method. Only text that may be promted to an end user are seriously exposed to localization. Developers must live with english.

[Note] Visit the "LOCALIZATION" section below for extensive information.

LOCALIZATION

All localization is done via Locale::Maketext. The package Data::Type::L18N is the base class, while Data::Type::L18N::<id> is a concrete implementation.

LOCALES

$Data::Type::L18N::de

German. Not very complete.

$Data::Type::L18N::eng

Complete English dictionary.

And to set to your favorite locale during runtime use the set_locale method of Data::Type (Of course the locale must be implemented).

  use Data::Type qw(:all +DB);

    Data::Type->set_locale( 'de' );  # set to german texts

    ...

Visit the "LOCALIZATION" in Data::Type::Docs::Howto section for more on adding your own language.

[Note] Localization is only used for texts which somehow will be prompted to the user vis the summary() functions or an exception. This should help developing, for example, web applications with Data::Type and you simply forward problems to the user in the correct language.

EXPORT

No Functions, but the STD collection is imported per default.

FUNCTIONS

is, isnt, valid, dvalid, catalog, toc, summary, try and with.

Exporter sets are:

':all' [qw(is isnt valid dvalid catalog toc summary try with)]

':valid' or ':is' [qw(is isnt valid dvalid)]

':try' [qw(try with)]

DATATYPES

You can control the datatypes to be exported with following parameter.

+<uppercased collection id> (i.e. BIO, DB, ... )

The STD is loaded everytime (And you cannot unload it currently). Currently following collections are available DB, BIO, PERL, PERL6 (see above). The special collection ALL is a synonym for all available collections.

Example:

 use Data::Type qw(:all +BIO);  # ..export the BIO collection

 use Data::Type qw(:all +DB);   # ..the DB collection

 use Data::Type qw(:all +ALL);  # ..and all available collections

[Note] Data::Type pollutes namespaces en mass, but mitigates this via subjecting only to UPPERCASED namespaces. These are generally reserved and therefore hopefully not often used. If one has conflicts with legacy code use export options below.

OPTIONS

MASTER PREFIX

With this option you change the default datatypes alias's. If you use this option all alias's are prefixed with that string. The option is identified by a starting "<" and ending ">". One should care not to produce invalid package/function name constructs (spaces etc.). So if you want stop namespace pollution and want that all datatypes are send to a single namespace (eg. <"any::">) invoke Data::Type like this:

  use Data::Type qw(:all <dt::> +BIO +DB);

  die unless is dt::STD::EMAIL;

so all later code accessing datatypes should use this prefix. It doesnt need to be a namespace, and <"__"> would be absolutely valid (because the alias's are created via a string fed to "eval" in perlfunc. So thats valid:

  use Data::Type qw(:all <__>);

  die unless is __STD::EMAIL;

[Note] Generally all datatypes are dispatched via an "AUTOLOAD" in perlfunc routine in the Data::Type::Proxy namespace. Via runtime codegeneration an alias subroutine is created to hop the the original call.

  sub DB::ENUM { Data::Type::Proxy::db_enum( @_ ) };

In this example any use of DB::ENUM gets redirected to Data::Type::Object::db_enum interface (dont call it directly!).

UNDERSCORE

A single occurance of _ within the import parameters will activatve UNDERSCORE namespace resolution. That is, instead of using the COLLECTION::TYPE:: theme for the datatypes the '::' part is replaced with an '_' (underscore). In terms of namespace pollution a sterile solution.

So you want everything within Data::Type:::

  use Data::Type qw(:all _ <Data::Type::> +ALL);

  die unless is Data::Type::STD_EMAIL();  # default was STD::ENUM

Unless a MASTER_PREFIX is defined, UNDERSCORE will export the types into the caller package:

  use Data::Type qw(:all _ +ALL);

  die unless is STD_EMAIL();  # default was STD::ENUM

If MASTER_PREFIX is defined, UNDERSCORE will export the types into Data::Type::. This can be somewhat confusing. Use explicit package names within the MASTER_PREFIX to circumvent this ambiguous style.

  package main;

    use Data::Type qw(:all _ <main::TYPE_> +ALL);

    die unless is TYPE_STD_EMAIL();  # default was STD::ENUM

If i handn't introduced main:: in the MASTER_PREFIX i have exported types into Data::Type::, remembers:

  use Data::Type qw(:all _ <TYPE_> +ALL);

  die unless is Data::Type::TYPE_STD_EMAIL();  # default was STD::ENUM

DEBUG

Will increase debuglevel one up. Place multiple times for increased verbosity.

  use Data::Type qw(:all DEBUG++ DEBUG++);

would yield to debuglevel 2. To decrease debuglevel one level:

  use Data::Type qw(:all DEBUG++ +BIO DEBUG--);

would turn debuglevel up during import process of the BIO collection and then back to default.

PREREQUISITES

General

Class::Maker (0.05.17), Regexp::Box (0.01), Error (0.15), IO::Extended (0.06), Tie::ListKeyedHash (0.41), Data::Iter (0), Class::Multimethods (1.70), Attribute::Util (0.01), DBI (1.30), Text::TabularDisplay (1.18), String::ExpandEscapes (0.01), XML::LibXSLT (1.53)

Additionally required

The following modules are eval'ed at runtime if required. Data::Type delays the loading of them until a datatype is actually using it. This has some (more) pro and cons. May be somebody could realize a small "delay" first time using a datatype.

If you install this module via CPAN, all modules below are also required and should be installed if you have setup CPAN correctly. Even if you never intend to use some of the datatypes they are strictly required. But this shouldnt hurt too much.

Locale::Language (2.21)
by STD::LANGCODE, STD::LANGNAME
Business::CreditCard (0.27)
by STD::CREDITCARD
Email::Valid (0.15)
by STD::EMAIL
Business::UPC (0.04)
by STD::UPC
HTML::Lint (1.26)
by STD::HTML
Business::CINS (1.13)
by STD::CINS
Date::Parse (2.27)
by DB::DATE, STD::DATE
Net::IPv6Addr (0.2)
by STD::IP
Business::ISSN (0.90)
by STD::ISSN
Regexp::Common (2.113)
by STD::INT, STD::IP, STD::QUOTED, STD::REAL, STD::URI, STD::ZIP
X500::DN (0.28)
by STD::X500::DN
Locale::SubCountry (0)
by STD::COUNTRYCODE, STD::COUNTRYNAME, STD::REGIONCODE, STD::REGIONNAME
XML::Schema (0.07)
by W3C::ANYURI, W3C::BASE64BINARY, W3C::BOOLEAN, W3C::BYTE, W3C::DATE, W3C::DATETIME, W3C::DECIMAL, W3C::DOUBLE, W3C::DURATION, W3C::ENTITIES, W3C::ENTITY, W3C::FLOAT, W3C::GDAY, W3C::GMONTH, W3C::GMONTHDAY, W3C::GYEAR, W3C::GYEARMONTH, W3C::HEXBINARY, W3C::ID, W3C::IDREF, W3C::IDREFS, W3C::INT, W3C::INTEGER, W3C::LANGUAGE, W3C::LONG, W3C::NAME, W3C::NCNAME, W3C::NEGATIVEINTEGER, W3C::NMTOKEN, W3C::NMTOKENS, W3C::NONNEGATIVEINTEGER, W3C::NONPOSITIVEINTEGER, W3C::NORMALIZEDSTRING, W3C::NOTATION, W3C::POSITIVEINTEGER, W3C::QNAME, W3C::SHORT, W3C::STRING, W3C::TIME, W3C::TOKEN, W3C::UNSIGNEDBYTE, W3C::UNSIGNEDINT, W3C::UNSIGNEDLONG, W3C::UNSIGNEDSHORT
XML::Parser (2.34)
by STD::XML
Pod::Find (0.24)
by STD::POD

EXAMPLES

You can find typical uses in Data::Type::Docs::Howto and some scripts may reside in t/ and contrib/ of this distribution.

CONTACT

Sourceforge http://sf.net/projects/datatype is hosting a project dedicated to this module. And I enjoy receiving your comments/suggestion/reports also via http://rt.cpan.org or http://testers.cpan.org.

AUTHOR

Murat Uenalan, <muenalan@cpan.org>

SEE ALSO

All the basic are described at Data::Type::Docs. It also navigates you through the rest of the documentation.

Data::Type::Docs::FAQ, Data::Type::Docs::FOP, Data::Type::Docs::Howto, Data::Type::Docs::RFC, Data::Type::Facet, Data::Type::Filter, Data::Type::Query, Data::Type::Collection::Std

And these CPAN modules:

Data::Types, String::Checker, Regexp::Common, Data::FormValidator, HTML::FormValidator, CGI::FormMagick::Validator, CGI::Validate, Email::Valid::Loose, Embperl::Form::Validate, Attribute::Types, String::Pattern, Class::Tangram, WWW::Form

W3C XML Schema datatypes

http://www.w3.org/TR/xmlschema-2/

Synopsis 6 by Damian Conway, Allison Randal

http://www.perl.com/pub/a/2003/04/09/synopsis.html?page=3

1 POD Error

The following errors were encountered while parsing the POD:

Around line 975:

alternative text 'W3C/XML-Schema Collection ('W3C')' contains non-escaped | or /