The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

DBIx::DataModel - Classes and UML-style Associations on top of DBI

SYNOPSIS

in file "MySchema.pm"

Declare the schema

  use DBIx::DataModel;
  DBIx::DataModel->Schema('MySchema'); # MySchema is now a Perl package

Declare the tables with (Perl name, DB name, primary key column(s)). Each table then becomes a Perl package.

  MySchema->Table(qw/Employee   Employee   emp_id/);
  MySchema->Table(qw/Department Department dpt_id/);
  MySchema->Table(qw/Activity   Activity   act_id/);

Declare associations in UML style ( [table1 role1 multiplicity1 join1], [table2...]).

  MySchema->Association([qw/Activity   activities * emp_id/],
                        [qw/Employee   employee   1 emp_id/]);
  MySchema->Association([qw/Activity   activities * dpt_id/],
                        [qw/Department department 1 dpt_id/]);

Declare a n-to-n association, on top of the linking table

  MySchema->Association([qw/Department departments * activities department/]);
                        [qw/Employee   employees   * activities employee/]);

Declare "column types" with some handlers ..

  # date conversion between database (yyyy-mm-dd) and user (dd.mm.yyyy)
  MySchema->ColumnType(Date => 
     fromDB   => sub {$_[0] =~ s/(\d\d\d\d)-(\d\d)-(\d\d)/$3.$2.$1/},
     toDB     => sub {$_[0] =~ s/(\d\d)\.(\d\d)\.(\d\d\d\d)/$3-$2-$1/},
     validate => sub {$_[0] =~ m/(\d\d)\.(\d\d)\.(\d\d\d\d)/});
  
  # 'percent' conversion between database (0.8) and user (80)
  MySchema->ColumnType(Percent => 
     fromDB   => sub {$_[0] *= 100 if $_[0]},
     toDB     => sub {$_[0] /= 100 if $_[0]},
     validate => sub {$_[0] =~ /1?\d?\d/});

.. and apply these "column types" to some of our columns

  Employee->ColumnType(Date    => qw/d_birth/);
  Activity->ColumnType(Date    => qw/d_begin d_end/);
  Activity->ColumnType(Percent => qw/activity_rate/);

Declare a column that will be filled automatically at each update

  MySchema->AutoUpdateColumns(last_modif => 
    sub{$ENV{REMOTE_USER}.", ".scalar(localtime)});

For details that could not be expressed in a declarative way, just add a new method into the table class (but in that case, Schema and Table declarations should be in a BEGIN block, so that the table class is defined before you start adding methods to it).

  package Activity; 
  
  sub activePeriod {
    my $self = shift;
    $self->{d_end} ? "from $self->{d_begin} to $self->{d_end}"
                   : "since $self->{d_begin}";
  }

Declare how to automatically expand objects into data trees

  Activity->AutoExpand(qw/employee department/);

in file "myClient.pl"

  use MySchema;

Search employees whose name starts with 'D' (select API is taken from SQL::Abstract)

  my $empl_D = Employee->select({lastname => {-like => 'D%'}});

idem, but we just want a subset of the columns, and order by age.

  my $empl_F = Employee->select(-columns => [qw/firstname lastname d_birth/],
                                -where   => {lastname => {-like => 'F%'}}
                                -orderBy => 'd_birth');

Print some info from employees. Because of the 'fromDB' handler associated with column type 'date', column 'd_birth' has been automatically converted to display format.

  foreach my $emp (@$empl_D) {
    print "$emp->{firstname} $emp->{lastname}, born $emp->{d_birth}\n";
  }

Same thing, but using method calls instead of direct access to the hashref (must enable AUTOLOAD in the table or the whole schema)

  Employee->Autoload(1); # or MySchema->Autoload(1)
  foreach my $emp (@$empl_D) {
    printf "%s %s, born %s\n", $emp->firstname, $emp->lastname, $emp->d_birth;
  }

Follow the joins through role methods

  foreach my $act (@{$emp->activities}) {
    printf "working for %s from $act->{d_begin} to $act->{d_end}", 
      $act->department->name;
  }

Role methods can take arguments too, like select()

  my $recentAct  = $dpt->activities({d_begin => {'>=' => '2005-01-01'}});
  my @recentEmpl = map {$_->employee([qw/firstname lastname/])} @$recentAct;

Export the data : external helper modules usually expect a full data tree (instead of calling methods dynamically), so we need to expand the objects :

  $_->expand('activities') foreach @$empl_D;
  my $export = {employees => $empl_D};
  use Data::Dumper; print Dumper   ($export); # export as PerlDump
  use XML::Simple;  print XMLout   ($export); # export as XML
  use JSON;         print objToJson($export); # export as Javascript

Select associated tables directly from a database join, in one single SQL statement (instead of iterating through role methods).

  my $lst = MySchema->ViewFromRoles(qw/Employee activities department/)
                    ->select([qw/lastname dept_name d_begin/], 
                             {d_begin => {'>=' => '2000-01-01'}});

DESCRIPTION

Introduction

DBIx::DataModel is a wrapper framework for building Perl abstractions (classes, objects and datastructures) that interact with relational database management systems (RDBMS). Of course the ubiquitous DBI module is used as a basic layer for communicating with databases; on top of that, DBIx::DataModel provides facilities for generating SQL queries, joining tables automatically, navigating through the results, converting values, and building complex datastructures so that other modules can conveniently exploit the data.

There are many other CPAN modules offering similar features, like Class::DBI, DBIx::Class, Alzabo, Tangram, Rose::DB::Object, just to name a few well-known alternatives. Module frameworks in this family are called object-relational mappings (ORMs). The mere fact that they are so numerous demonstrates that there is more than one way to do it, and therefore it is quite unlikely that any ORM would ever cover all possible needs.

A brief discussion of the design space and of other Perl ORMs is provided in section "SEE ALSO" of this manual; for the moment, we will concentrate on introducing the main concepts and features of DBIx::DataModel.

Index to the documentation

Although the basic principles are quite simple, there are many details to discuss, so the documentation is quite long. In an attempt to accomodate for different needs of readers, it has been structured as follows :

  • The "DESIGN PRINCIPLES" section covers the main distinctive features of DBIx::DataModel; it is mainly of interest if you are comparing various ORMs.

  • The "QUICKSTART" section summarizes the main steps to get started with the framework.

  • The "METHOD REFERENCE" section is a complete reference to all methods, divided according to usage steps: creating a schema, populating it with table and associations, parameterizing the framework, and finally data retrieval and manipulation methods.

  • The "OTHER CONSIDERATIONS" section discusses how this framework interacts with its context (Perl namespaces, DBI layer, etc.).

  • The "INTERNALS" section documents the internal structure of the framework, for programmers who might be interested in extending it.

  • The "SEE ALSO" section briefly discusses other ORMs.

  • The "TO DO" section lists some features that hopefully will be implemented in a future release.

DISCLAIMER: although already in production in our organization, this code is still in beta, so the API may slightly change in future versions.

DESIGN PRINCIPLES

This section covers the main motivating principles for proposing yet another ORM. Read it if you are currently evaluating whether DBIx::DataModel is suitable for your context. Skip it and jump to the "QUICKSTART" section if you want to directly start using the framework.

Help lower-level layers, do not hide them

DBIx::DataModel provides abstractions that help client applications to automate some common tasks; however, access to lower-level layers remains open, for cases where detailed operations are needed :

  • The generated classes contain methods that can return polymorphic results. By default, the return value is an object or a list of objects corresponding to data rows; however, these methods can also return a handle to the underlying DBI statement, or even just the generated SQL code. Hence, the client code can take control whenever any fine tuning is needed.

  • Data rows exploit the dual nature of Perl objects : on one hand they can be seen as objects, with methods to walk through the data and access related rows from other tables, but on the other hand they can also be seen as hashrefs, with usual Perl idioms for extracting keys, values or slices of data. This dual nature is important for passing data to external helper modules, such as XML generators, Perl dumps, javascript JSON, templates of the Template Toolkit, etc. Such modules need to walk on the data tree, exploring keys, values and subtrees; so they cannot work if data columns are implemented as object-oriented methods (because there is no simple way to ask for all available methods, and even if you get there, it is not possible to distinguish which of those methods encapsulate relevant data).

Let the database do the work

Use RDBMS tools to create the schema

Besides basic SQL data definition statements, RDBMS often come with their own helper tools for creating or modifying a database schema (interactive editors for tables, columns, datatypes, etc.). Therefore DBIx::DataModel provides no support in this area, and assumes that the database schema is pre-existent.

To talk to the database, the framework only needs to know a bare minimum about the schema, namely the table names, primary keys, and UML associations; but no details are required about column names or their datatypes.

Let the RDBMS check data integrity

Most RDBMS have facilities for checking or ensuring integrity rules : foreign key constraints, restricted ranges for values, cascaded deletes, etc. DBIx::DataModel can also do some validation tasks, by setting up column types with a validate handler; however, it is better advised to exploit data integrity checks within the RDBMS whenever possible.

Exploit database projections through variable-size objects

Often in ORMs, columns in the table are in 1-to-1 correspondance with attributes in the class; so any transfer between database and memory systematically includes all the columns, both for selects and for updates. Of course this has the advantage of simplicity for the programmer. However, it may be very inefficient if the client program only wants to read two columns from a very_big_table.

Furthermore, unexpected concurrency problems may occur : in a scenario such as

  client1                            client2                       
  =======                            =======                      
  my $obj = MyTable->fetch($key);    my $obj = MyTable->fetch($key);
  $obj->set(column1 => $val1);       $obj->set(column2 => $val2); 
  $obj->update;                      $obj->update;                

the final state of the row should theoretically be consistent for any concurrent execution of client1 and client2. However, if the ORM layer blindly updates all columns, instead of just the changed columns, then the final value of column1 or column2 is unpredictable.

To diminish the efficiency problem, some ORMs offer the possibility to partition columns into several column groups. The ORM layer then transparently fetches the appropriate groups in several steps, depending on which columns are requested from the client. However, this might be another source of inefficiency, if the client frequently needs one column from the first group and one from the second group.

With DBIx::DataModel, the client code has very precise control over which columns to transfer, because these can be specified separately at each method call. Whenever efficiency is not an issue, one can be lazy and specify nothing, in which case the SELECT columns will default to "*". Actually, the schema does not know about column names, except for primary and foreign keys, and therefore would be unable to transparently decide which columns to retrieve. Consequently, objects from a given class may be of variable size :

  my $objs_A = MyTable->select(-columns => [qw/c1 c2/], 
                               -where   => {name => {-like => "A%"}};
  
  my $objs_B = MyTable->select(-columns => [qw/c3 c4 c5/], 
                               -where   => {name => {-like => "B%"}};
  
  my $objs_C = MyTable->select(# nothing specified : defaults to '*'
                               -where   => {name => {-like => "C%"}};

Therefore the programmer has much more freedom and control, but of course also more responsability : in this example, attempts to access column c1 in members of @$objs_B would yield an error.

Exploit database products (joins) through multiple inheritance

ORMs often have difficulties to exploit database joins, because joins contain columns from several tables at once. If tables are mapped to classes, and rows are mapped to objects of those classes, then what should be the class of a joined row ? Three approaches can be taken

  • ignore database joins altogether : all joins are performed within the ORM, on the client side. This is of course the simplest way, but also the less efficient, because many database queries are needed in order to gather all the data.

  • ask a join from the database, then perform some reverse engineering to split each resulting row into several objects (partitioning the columns).

  • create on the fly a new subclass that inherits from all joined tables : data rows then simply become objects of that new subclass. This is the approach taken by DBIx::DataModel.

High-level declarative statements

Relationships expressed as UML associations

Relationships are expressed in a syntax designed to closely reflect how they would be pictured in a Unified Modelling Language (UML) diagram. The general form is :

  MySchema->Association([$class1, $role1, $multiplicity1, @columns1], 
                        [$class2, $role2, $multiplicity2, @columns2]);

yielding for example the following declaration

  MySchema->Association([qw/Employee   employee   1    emp_id/],
                        [qw/Activity   activities 0..* emp_id/]);

which corresponds to UML diagram

  +----------+                         +------------+
  |          | 1                  0..* |            |
  | Employee +-------------------------+ Activities |
  |          | employee     activities |            |
  +----------+                         +------------+

This states that there is an association between classes Employee and Activity, with the corresponding role names (roles are used to navigate through the association in both directions), and with the corresponding multiplicities (here an activity corresponds to exactly one employee, while an employee may have many activities).

In the UML specification, role names and multiplicities are optional (as a matter of fact, many UML diagrams use association names, or even anonymous associations, instead of role names). Here, both role names and multiplicities are mandatory, because they are needed for code generation.

The association declaration is bidirectional, so it will simultaneously add features in both participating classes.

The last items in that declaration, namely emp_id on both lines, define the names of columns to join in order to navigate across the association. Admitedly, this is an implementation detail, that would not be pictured in a conceptual UML diagram; however we need it here so that the framework can later generate the appropriate SQL join statements.

Role names declared in the association are used for a number of purposes : implementing methods for direct navigation, implementing methods for inserting new members into owner objects, and implementing multi-step navigation paths through several assocations, such as in :

   $myDepartment->selectFromRoles(qw/activities employee spouse/)
                ->(-columns => \@someColumns,
                   -where   => \%someCriteria);

Information known by the schema about the associations will be used to automatically generate the appropriate database joins. The kinds of joins (INNER JOIN, LEFT OUTER JOIN) are inferred from the multiplicities declared in the association. These can also be explicitly overridden by writing

   ...->selectFromRoles(qw/activities <=> employee <=> spouse/) # inner joins

   ...->selectFromRoles(qw/activities  => employee  => spouse/) # left joins

If referential integrity rules are declared within the RDBMS, then there is some overlap with what is declared here on the Perl side. However, it would not be possible to automatically deduce all association information from database metadata, because the database does not know about role names and multiplicities. Therefore DBIx::DataModel has no "loader" facility to automatically generate a schema.

ColumnTypes

A DBIx::DataModel schema can declare some column types : these are collections of handlers (callback functions) for performing tasks such as data validation or transformation. Handlers are then attached to specific columns belonging to that column type.

The handler concept is generic and can be exploited by client programs according to the application domain. However, some handler names have a special meaning within the framework : for example, handlers named fromDB or toDB are automatically called when transfering data from or to the database. Take for example the "Percent" column type shown in the Synopsis :

  # 'percent' conversion between database (0.8) and user (80)
  MySchema->ColumnType(Percent => 
     fromDB   => sub {$_[0] *= 100 if $_[0]},
     toDB     => sub {$_[0] /= 100 if $_[0]},
     validate => sub {$_[0] =~ /1?\d?\d/});

Note that this notion of "type" is independent from the actual datatypes defined within the database (integer, varchar, etc.). From the Perl side, these are all seen as scalar values. So a column type as defined here is just a way to specify some operations, programmed in Perl, that can be performed on the scalar values.

Autoexpand : build a data tree

An autoexpand declaration specifies how a given class will automatically call its own methods to gather data from associated classes, and build a complete data tree in memory. This is especially useful in cases that UML calls aggregations or compositions (various kinds of part-of relationships). Once the composite data is gathered in memory, it can be passed to external modules that need to walk through the tree, for example for XML or Json exports.

Autoload on demand

The default mechanism to access columns within a row is the hashref API:

  do_something_with($my_row->{column_name});

However, a method call API can be turned on, which would then yield:

  do_something_with($my_row->column_name());

Views within the ORM

A schema can contain View declarations, which are abstractions of SQL statements. This is exactly the same idea as database views, except that they are implemented within the ORM, not within the database. Such views can join several tables, or can specify WHERE clauses to filter the data. ORM views are useful to implement application-specific or short-lived requests, that would not be worth registering persistently within the database model. They can also be useful if you have no administration rights in the database. Of course it is also possible to access database views, because the ORM sees them as ordinary tables.

Extended SQL::Abstract API

Every method involving a SELECT in the database (either when searching rows from a table or collection of tables, or when following associations from an existing row) accepts an number of optional parameters that closely correspond to SQL clauses. The programming interface reuses what is defined in the excellent SQL::Abstract module, with some extensions. Therefore it is possible for example to specify

  • which columns to retrieve

  • which restriction criteria to apply (WHERE clause)

  • how to order the results

  • whether or not to retrieve distinct rows

  • etc.

All these parameters are specified at the statement level, and therefore may vary between subsequent calls to the same class. This is in contrast with many other ORMs where the set of columns or the ordering criteria are specified at schema definition time. As already stated above, DBIx::DataModel gives more freedom to client programs, but also more responsability.

Minimize dependencies for easy installation

DBIx::DataModel only depends on DBI and SQL::Abstract, so it should be very easy to install even without help of tools like ppm, cpan or cpanp.

QUICKSTART

This section will show the main steps to get started with DBIx::DataModel. The goal here is conciseness, not completeness; a full reference will be given in the next sections. The tutorial is a gentle expansion of the examples given in the SYNOPSIS, namely a small human resources management system.

Basic assumptions

Before starting with DBIx::DataModel, you should have installed CPAN modules DBI and SQL::Abstract. You also need a database management system with a DBD driver.

Use your database modeling tool to create some tables for employees, departments, activities (an employee working in a department from a start date to an end date), and employee skills. If you have no modeling tool, you can also feed something like the following SQL code to the database

  CREATE TABLE t_employee (
    emp_id     INTEGER AUTO_INCREMENT PRIMARY KEY,
    lastname   TEXT    NOT NULL,
    firstname  TEXT,
    d_birth    DATE 
  ); 
  CREATE TABLE t_department (
    dpt_code   VARCHAR(5) PRIMARY KEY,
    dpt_name   TEXT    NOT NULL 
  );
  CREATE TABLE t_activity (
    act_id     INTEGER AUTO_INCREMENT PRIMARY KEY,
    emp_id     INTEGER NOT NULL REFERENCES t_employee(emp_id),
    dpt_code   VARCHAR(5) NOT NULL REFERENCES t_department(dpt_code),
    d_begin    DATE    NOT NULL,
    d_end      DATE
  );
  CREATE TABLE t_skill (
    skill_code VARCHAR(2) PRIMARY KEY,
    skill_name TEXT    NOT NULL 
  );
  CREATE TABLE t_employee_skill (
    emp_id         INTEGER NOT NULL REFERENCES t_employee(emp_id),
    skill_code     VARCHAR(2)  NOT NULL REFERENCES t_skill(skill_code),
    CONSTRAINT PRIMARY KEY (emp_id, skill_code)
  );

As can be seen from this SQL, we assume that the primary keys for t_employee and t_activity are generated automatically by the RDBMS. Primary keys for other tables are character codes and therefore should be supplied by the client program.

Declare schema and tables

DBIx::DataModel needs to acquire some knowledge about the datamodel. So we first declare a schema :

  use DBIx::DataModel;
  DBIx::DataModel->Schema('HR');

Here we have chosen a simple acronym HR as the schema name, but it could as well have been something like Human::Resources.

The schema now is a Perl class, so we invoke its Table method to declare the first table within the schema :

  HR->Table(qw/HR::Employee      t_employee        emp_id/);

This creates a new Perl class named HR::Employee. It could as well have been simply named Employee, or any other legal Perl package name; here the use of the schema name as a prefix is just a design choice, not an obligation. The second argument t_employee is the database table, and the third argument emp_id is the primary key. So far nothing is declared about other columns in the table.

Other tables are declared in a similar fashion :

  HR->Table(qw/HR::Department    t_department      dpt_code/);
  HR->Table(qw/HR::Activity      t_activity        act_id/);
  HR->Table(qw/HR::Skill         t_skill           skill_code/);
  HR->Table(qw/HR::EmployeeSkill t_employee_skill  emp_id  skill_code/);

This last declaration has 4 arguments because the primary key ranges over 2 columns.

Declare column types

RDBMS will usually require that dates be in ISO format of shape yyyy-mm-dd. Let's assume our users are European and want to see and enter dates of shape dd.mm.yyyy. Insert of converting back and forth within the client code, it's easier to do it at the ORM level. So we define conversion routines within a "Date" column type

  HR->ColumnType(Date => 
     fromDB => sub {$_[0] =~ s/(\d\d\d\d)-(\d\d)-(\d\d)/$3.$2.$1/   if $_[0]},
     toDB   => sub {$_[0] =~ s/(\d\d)\.(\d\d)\.(\d\d\d\d)/$3-$2-$1/ if $_[0]},
     validate => sub {$_[0] =~ m/\d\d\.\d\d\.\d\d\d\d/});

and then apply this type to the appropriate columns

  HR::Employee->ColumnType(Date => qw/d_birth/);
  HR::Activity->ColumnType(Date => qw/d_begin d_end/);

Here we just perform scalar conversions; another design choice could be to "inflate" the data to some kind of Perl objects.

Observe that ColumnType is overloaded : when invoked on a schema, it defines a column type; when invoked on a class, it applies the column type to some columns.

Declare associations

Now we will declare a binary association between employees and activities:

  HR->Association([qw/HR::Employee   employee    1 emp_id/],
                  [qw/HR::Activity   activities  * emp_id/]);

The Association method takes two references to lists of arguments; in each of them, we find : class name, role name, multiplicity, and names of columns participating in the join. Since associations are symmetric, you could as well supply the two lists in the reverse order.

The declaration should be read crosswise, like when reading a UML class diagram : here we are stating that an employee may be associated with several activities; therefore the HR::Employee class will contain an activities method which returns an arrayref. Conversely, an activity is associated with exactly one employee, so the HR::Activity will contain an employee method which returns a single instance of HR::Employee.

The second association is defined in a similar way :

  HR->Association([qw/HR::Department department  1 dpt_code/],
                  [qw/HR::Activity   activities  * dpt_code/]);

Now comes the association between employees and skills, which is a many-to-many association. This happens in two steps: first we declare as usual the associations with the linking table :

  HR->Association([qw/HR::Employee      employee   1 emp_id/],
                  [qw/HR::EmployeeSkill emp_skills * emp_id/]);

  HR->Association([qw/HR::Skill         skill      1 skill_code/],
                  [qw/HR::EmployeeSkill emp_skills * skill_code/]);

Then we declare the many-to-many association:

  HR->Association([qw/HR::Employee  employees   * emp_skills employee/],
                  [qw/HR::Skill     skills      * emp_skills skill   /]);

This looks almost exactly like the previous declarations, except that the last arguments are no longer column names, but rather role names: these are the sequences of roles to follow in order to implement the association. This example is just an appetizer; more explanations are provided in the reference section.

Use the schema

To use the schema, we first need to supply it with a database connection :

  my $dbh = DBI->connect(...); # parameters according to your RDBMS
  HR->dbh($dbh);               # give $dbh handle to the schema

Now we can start populating the database:

  my ($bach_id, $berlioz_id, $monteverdi_id) = 
    HR::Employee->insert({firstname => "Johann",  lastname => "Bach"},
                         {firstname => "Hector",  lastname => "Berlioz"},
                         {firstname => "Claudio", lastname => "Monteverdi"});

Observe that several rows can be created at once (of course you get the same result by calling insert() several times). According to our earlier assumptions, keys are generated automatically within the database, so they need not be supplied here. The return value of the method is the list of generated ids (provided that your database driver supports DBI's last_insert_id method).

Similarly, we create some departments and skills (here with explicit primary keys) :

  HR::Department->insert({dpt_code => "CPT",  dpt_name => "Counterpoint"},
                         {dpt_code => "ORCH", dpt_name => "Orchestration"});

  HR::Skills->insert({skill_code => "VL",  skill_name => "Violin"},
                     {skill_code => "KB",  skill_name => "Keyboard"},
                     {skill_code => "GT",  skill_name => "Guitar"},

To perform updates, there is either a class method or an object method. Here is an example with the class method :

  HR::Employee->update($bach_id => {firstname => "Johann Sebastian"});

Associations have their own insert methods, named insert_into_* :

  my $bach = HR::Employee->fetch($bach_id);
  
  $bach->insert_into_activities({d_begin => '01.01.1695',
                                 d_end   => '18.07.1750',
                                 dpt_code => 'CPT'});
  
  $bach->insert_into_emp_skills({skill_code => 'VL'},
                                {skill_code => 'KB'});

The fetch() method above retrieves one single record from its primary key. To get several records, use the the select() method :

  my $all_employees = HR::Employee->select; 
  foreach my $emp (@$all_employees) {
    do_something_with($emp);
  }

or maybe we want something more specific :

  my @columns  = qw/firstname lastname/;
  my %criteria = (lastname => {-like => 'B%'});
  my $some_employees 
     = HR::Employee->select(-columns => \@columns,
                            -where   => \%criteria,
                            -orderBy => 'd_birth');

From a given object, role methods allow us to get associated objects :

  foreach my $emp (@$all_employees) {
    print "$emp->{firstname} $emp->{lastname} ";
    my @skills = map {$_->{skill_name}  }} @{$emp->skills};
    print " has skills ", join(", ", @skills) if @skills;
  }

Passing arguments to role methods, we can restrict to specific columns or specific rows, exactly like the select() method :

  my @columns = qw/d_begin d_end/;
  my %criteria = (d_end => undef);
  my $current_activities = $someEmp->activities(-columns => \@columns,
                                                -where   => \%criteria);

And it is possible to join on several roles at once:

  my $result = $someEmp->selectFromRoles(qw/activities department/)
                       ->(-columns => \@columns,
                          -where   => \%criteria);

This concludes this short tutorial. More examples are given in the reference section below.

METHODS REFERENCE

General convention

Method names starting with an uppercase letter are meant to be compile-time class methods. These methods will typically be called when loading a module like 'MySchema.pm', and therefore will be executed during the BEGIN phase of the Perl compiler. They instruct the compiler to create classes, methods and datastructures for representing the elements of a database schema.

Method names starting with a lowercase letter are meant to be usual run-time methods, either for classes or for instances.

Creating a schema

Schema

  DBIx::DataModel->Schema($schemaName, %options)

Creates a new Perl class of name $schemaName that represents a database schema. That class inherits from DBIx::DataModel::Schema. Possible options are :

dbh => $dbh

Connects the schema to a DBI database handle. This can also be set or reset later via the "dbh" method.

sqlDialect => $dialect

SQL has no standard syntax for performing joins, so if your database wants a particular syntax you will need to declare it. Current builtin dialects are either 'MsAccess', 'BasisODBC' or 'Default' (contributions to enrich this list are welcome). Otherwise $dialect can also be a hashref in which you supply the following information :

innerJoin

a string in sprintf format, with placeholders for the left table, the right table and the join criteria. Default is %s INNER JOIN %s ON %s. If your database does not support inner joins, set this to undef and the generated SQL will be in the form T1, T2, ... Tn WHERE ... AND ... .

leftJoin

a string for left outer joins. Default is %s LEFT OUTER JOIN %s ON %s.

joinAssociativity

either left or right

For backwards compatibility with the previous API, you can specify the dbh as second argument, instead of a named option, i.e. Schema($schemaName, $dbh) instead of Schema($schemaName, dbh => $dbh).

Populating a schema

Table

  MySchema->Table($pckName, $dbTable, @primKey);

Creates a new Perl class of name $pckName that represents a database table. That class inherits from DBIx::DataModel::Table. $dbTable should contain the name of the table in the database. @primKey should contain the name of the column (or names of columns) holding the primary key for that table. This info will be used for interpreting arguments to the "fetch" method, and for filling WHERE clauses in the SQL generated by the "update" method.

View

  MySchema->View($viewName, $columns, $dbTables, \%where, @parentTables);

Creates a new Perl class of name $viewName that represents a SQL SELECT request of shape

  SELECT $columns FROM $dbTables [ WHERE %where ]

Therefore arguments $columns and $dbTables should be strings; the optional \%where argument should be a hashref, as explained below.

View() is seldom called explicitly from client code; it is mainly useful internally for implementing other methods like "ViewFromRoles" or "selectFromRoles". However, it could also be used to build queries with specific SQL clauses like for example

  MySchema->View(MyView =>
     "DISTINCT column1 AS c1, t2.column2 AS c2",
     "Table1 AS t1 LEFT OUTER JOIN Table2 AS t2 ON t1.fk=t2.pk",
     {c1 => 'foo', c2 => {-like => 'bar%'}},
     qw/Table1 Table2/);

The class generated by View() has a select() method, which will

  • select records from the database according to the criteria of the view, merged with the criteria of the request;

  • apply the 'fromDB' handlers of the parent tables to those records;

  • bless the results into objects of $viewName.

See SQL::Abstract and the select() method for a complete description of what to put in the \%where argument. For the moment, just consider the following example:

  my $lst = MyView->select({c3 => 22});

This would generate the SQL statement:

  SELECT DISTINCT column1 AS c1, t2.column2 AS c2
  FROM  Table1 AS t1 
        LEFT OUTER JOIN Table2 AS t2 ON t1.fk=t2.pk
  WHERE (c1 = 'foo' AND c2 LIKE 'bar%' AND c3 = 22)

The \%where argument can of course be undef.

The optional list of @parentTables contains names of Perl table classes from which the view will also inherit. If the SQL code in $dbTables is a join between several tables, then it is a good idea to mention these tables in @parentTables, so that their role methods become available to instances of MyView. Be careful about table names : the SQL code in $dbTables should contain database table names, whereas the members of @parentTables should be Perl table classes (might be the same, but not necessarily).

Perl views as defined here have nothing to do with views declared in the database itself. Perl views are totally unknown to the database, they are just abstractions of SQL statements. If you need to access database views, just use the Table declaration, like for a regular table.

Association

  MySchema->Association([$class1, $role1, $multiplicity1, @columns1], 
                        [$class2, $role2, $multiplicity2, @columns2]);

Declares an association between two tables (or even two instances of the same table), in a UML-like fashion. Each side of the association specifies its table, the "rolename" of of this table in the association, the multiplicity, and the name of the column or list of columns that technically implement the association as a database join. Multiplicities should be written in the UML form '0..*', '1..*', '0..1', etc. (minimum .. maximum number of occurrences); this will influence how role methods and views are implemented, as explained below. Multiplicity '*' is a shortcut for '0..*', and multiplicity '1' is a shortcut for '1..1'. Role names should be chosen so as to avoid conflicts with column names in the same table.

Roles as additional methods in table classes

As a result of the association declaration, the Perl class $table1 will get an additional method named $role2 for accessing the associated object(s) in $table2; that method normally returns an arrayref, unless $multiplicity2 has maximum '1' (in that case the return value is a single object ref). Of course, $table2 conversely gets a method named $role1.

To understand why tables and roles are crossed, look at the UML picture :

  +--------+                     +--------+
  |        | *              0..1 |        |
  | Table1 +---------------------+ Table2 |
  |        | role1         role2 |        |
  +--------+                     +--------+

so from an object of Table1, you need a method role2 to access the associated object of Table2. In your diagrams, be careful to get the role names correctly according to the UML specification. Sometimes we see diagrams where role names are on the wrong side, mainly because modelers have a background in Entity-Relationship or Merise methods, where it is the other way around.

Role methods perform joins within Perl (as opposed to joins directly performed within the database). That is, given a declaration

  MySchema->Association([qw/Employee   employee   1    emp_id/],
                        [qw/Activity   activities 0..* emp_id/]);

we can call

  my activities = $anEmployee->activities

which will implicitly perform a

  SELECT * FROM Activity WHERE emp_id = $anEmployee->{emp_id}

The role method can also accept additional parameters in SQL::Abstract format, exactly like the select() method. So for example

  my $activities = $anEmployee->activities(-columns => [qw/act_name salary/],
                                           -where   => {isActive => 'Y'});

would perform the following SQL request :

  SELECT act_name, salary FROM Activity WHERE 
    emp_id = $anEmployee->{emp_id} AND
    isActive = 'Y'

If the role method is called without any parameters, and if that role was previously expanded (see "expand" method), i.e. if the object hash contains an entry $obj->{$role}, then this data is reused instead of calling the database again.

To specify a unidirectional association, just supply 0 or an empty string (or even the string "0" or '""' or "none") to one of the role names. In that case the corresponding role method is not generated.

Methods insert_into_...

When a role has multiplicity '*', another method named insert_into_... is also installed, that will create new objects of the associated class, taking care of the linking automatically :

  $anEmployee->insert_into_activities({d_begin => $today, dpt_id  => $dpt});

This is equivalent to

  Activity->insert({d_begin => $today, dpt_id  => $dpt, 
                    emp_id  => $anEmployee->{emp_id}});

Many-to-many associations

UML conceptual models may contain associations where both roles have multiplicity '*' (so-called many-to-many associations). However, when it comes to actual database implementation, such associations need an intermediate linking table to collect couples of identifiers from both tables.

DBIx::DataModel supports many-to-many associations as a kind of syntactic sugar, translated into low-level associations with the linking table. The linking table needs to be declared first :

  MySchema->Table(qw/link_table link_table prim_key1 prim_key2/);

  MySchema->Association([qw/table1     role1  0..1  prim_key1/],
                        [qw/link_table links    *   prim_key1/]);

  MySchema->Association([qw/table2     role2  0..1  prim_key2/],
                        [qw/link_table links    *   prim_key2/]);

This describes a diagram like this :

  +--------+               +-------+                 +--------+
  |        | 0..1        * | Link  | *          0..1 |        |
  | Table1 +---------------+  --   +-----------------+ Table2 |
  |        | role1  linksA | Table | linksB    role2 |        |
  +--------+               +-------+                 +--------+

Then we can declare the many-to-many association, very much like ordinary associations, except that the last items in the argument lists are names of roles to follow, instead of names of columns to join. In the diagram above, we must follow roles linksA and role2 in order to obtain the rows of Table2 related to an instance of Table1; so we write

  MySchema->Association([qw/table1  roles1  *  linksB role1/],
                        [qw/table2  roles2  *  linksA role2/]);

which describes a diagram like this :

              +--------+                    +--------+
              |        | *                * |        |
              | Table1 +--------------------+ Table2 |
              |        | roles1      roles2 |        |
              +--------+                    +--------+

The declaration has created a new method roles2 in Table1; that method is implemented by following roles linksA and role2. So for an object obj1 of Table1, the call

  my $obj2_arrayref = $obj1->roles2();

will generate the following SQL :

  SELECT * FROM link_table INNER JOIN table2
            ON link_table.prim_key2=table2.prim_key2
    WHERE link_table.prim_key1 = $obj->{prim_key1}

Observe that roles2() returns rows from a join, so these rows will belong both to Table2 and to Link_Table.

Caveat: currently, many-to-many associations do not have an automatic insert_into_* method.

Following multiple associations

In the previous section we were following two roles at once in order to implement a many-to-many association. More generally, it may be useful to follow several roles at once, joining the tables in a single SQL query. This can be done through the following methods :

  • "ViewFromRoles" : create a new View that selects from several tables, filling the joins automatically

  • "selectFromRoles" : from a given object, follow a list of roles to get information from associated tables.

  • "MethodFromRoles" : add a new method in a table, that will follow a list of roles (shortcut for repeated calls to selectFromRoles).

ViewFromRoles

  my $view = MySchema->ViewFromRoles($table, $role1, $role2, ..);

Creates a View class, starting from a given table class and following one or several associations through their role names. It calls the "View" method, with a collection of parameters automatically inferred from the associations. So for example

  MySchema->ViewFromRoles(qw/Department activities employee/);

is equivalent to

  my $sql = <<_EOSQL_
  Department 
    LEFT OUTER JOIN Activity ON Department.dpt_id = Activity.dpt_id
    LEFT OUTER JOIN Employee ON Activity.emp_id   = Employee.emp_id
  _EOSQL_
  
  MySchema->View(DepartmentActivitiesEmployee => '*', $sql, 
                 qw/Department Activity Employee/);

For each pair of tables, the kind of join is chosen according to the multiplicity declared with the role : if the minimum multiplicity is 0, the join will be LEFT OUTER JOIN; otherwise it will be a usual inner join (exception : after a first left join, all remaining tables are also connected through additional left joins). The default kind of join chosen by this rule may be overriden by inserting pseudo-roles in the list, namely '<=>' or INNER for inner joins and '=>' or LEFT for left joins. So for example

  MySchema->ViewFromRoles(qw/Department <=> activities <=> employee/);

becomes equivalent to

  my $sql = <<_EOSQL_
  Department 
    INNER JOIN Activity ON Department.dpt_id = Activity.dpt_id
    INNER JOIN Employee ON Activity.emp_id   = Employee.emp_id
  _EOSQL_

All tables participating in a ViewFromRoles are stacked, and further roles are found by walking up the stack. So in

  ..->ViewFromRoles(qw/FirstTable role1 role2 role3/)

we must find a role1 in FirstTable, from which we know what will be the Table2. Then, we must find a role2 in in Table2, or otherwise in FirstTable, in order to know Table3. In turn, role3 must be found either in Table3, or in Table2, or in FirstTable, etc.

The resulting view name will be composed by concatenating the table and the capitalized role names. If join kinds were explicitly set, these also belong to the view name, like Department_INNER_Activities. Since such names might be long and uncomfortable to use, the view name is also returned as result of the method call, so that the client code can store it in a variable and use it as an alias.

The main purpose of ViewFromRoles is to gain efficiency in interacting with the database. If we write

  foreach my $dpt (@{Department->select}) {
    foreach my $act ($dpt->activities) {
      my $emp = $act->employee;
      printf "%s works in %s since %s\n", 
         $emp->{lastname}, $dpt->{dpt_name}, $act->{d_begin};
    }
  }

many database calls are generated behind the scene. Instead we could write

  my $view = MySchema->ViewFromRoles(qw/Department activities employee/);
  foreach my $row (@{$view->select}) {
    printf "%s works in %s since %s\n", 
      $row->{lastname}, $row->{dpt_name}, $row->{d_begin};
  }

which generates one single call to the database.

Caveat : ViewFromRoles does not know about SQL table aliases. Therefore, if the role list contains several occurrences of the same table (for example through self-referential associations), the generated SQL will be incorrect because of ambiguous table names.

selectFromRoles

  my $lst = $obj->selectFromRoles(qw/role1 role2 .../)
                ->(-columns => [...], -where => {...}, -orderBy=>[...]);

Starting from a given object, returns a reference to a function that selects a collection of data rows from associated tables, performing the appropriate joins. Internally this is implemented throught the "ViewFromRoles" method, with an additional join criteria to constrain on the primary key(s) of $obj. The returned function takes the same arguments as the "select" method. So for example if $emp->{emp_id} == 987, then

  $emp->selectFromRoles(qw/activities department/)->({d_end => undef})

will generate

  SELECT * FROM Activity INNER JOIN Department 
                         ON Activity.dpt_id = Department.dpt_id
           WHERE  emp_id = 987 AND d_end IS NULL

MethodFromRoles

  TableOrView->MethodFromRoles($meth_name => qw/role1 role2 .../);

Inserts into the class a new method named $meth_name, that will automatically call "selectFromRoles" and pass arguments to the resulting function. This is useful for joining several tables at once, so for example with

  Department->MethodFromRoles(employees => qw/activities employee/);

we can then write

  my $empl_ref = $someDept->employees(-where   => {gender => 'F'},
                                      -columns => [qw/firstname lastname]);

This method is used internally to implement many-to-many associations; so if you have only two roles to follow, you would probably be better off by defining the association, which is a more abstract notion. Direct calls to MethodFromRoles are still useful if you want to follow three or more roles at once.

Schema or Table parameterization

DefaultColumns

  MyTable->DefaultColumns($columns);

Sets the default value for the -columns argument to select(). If nothing else is stated, the default value for all tables is '*'.

ColumnHandlers

  MyTable->ColumnHandlers($columnName => handlerName1 => coderef1,
                                         handlerName2 => coderef2, ...);

Associates some handlers to a given column in the current table class. Then, when you call $obj->applyColumnHandler('someHandler'), each column having a handler of the corresponding name will execute the associated code. This can be useful for all sorts of data manipulation :

  • converting dates between internal database format and user presentation format

  • converting empty strings into null values

  • inflating scalar values into objects

  • column data validation

Handlers receive the column value as usual through $_[0]. If the value is to be modified (for example for scalar conversions or while inflating values into objects), the result should be put back into $_[0]. In addition to the column value, other info is passed to the handler :

  sub myHandler {
    my ($columnValue, $obj, $columnName, $handlerName) = @;
    my $newVal = $obj->computeNewVal($columnValue, ...);
    $columnValue = $newVal; # WRONG : will be a no-op
    $_[0] = $newVal;        # OK    : value is converted
  }

The second argument $obj is the object from where $columnValue was taken -- most probably an instance of a Table or View class. Other arguments $columnName and $handlerName are obvious.

Handler names fromDB and toDB have a special meaning : they are called automatically just after reading data from the database, or just before writing into the database. Handler name validate is used by the method "hasInvalidColumns".

ColumnType

  MySchema->ColumnType(type_name => handler_name1 => coderef1,
                                    handler_name2 => coderef2, ...);

Declares a collection of column handlers under name type_name.

  MyTable->ColumnType(type_name => qw/column1 column2 .../);

Retrieves all column handlers defined under type_name in the schema and calls "ColumnHandlers" to register those handlers to column1, column2, etc.

Autoload

  MySchema->Autoload(1); # turn on AUTOLOAD 
  MySchema->Autoload(0); # turn it off
  MyClass ->Autoload(1); # turn it on, just for one class

If AUTOLOAD is turned on (default is off), then columns have implicit read accessors through AUTOLOAD. So instead of $record->{column} you can write $record->column.

Of course this is a bit slower than generating all accessors explicitly at compile time (through Class::Accessor or something similar), but the advantage is that you don't need to know all column names in advance. This is how we support variable column lists (two instances of the same Table do not necessarily hold the same set of columns, it all depends on what you chose when doing the SELECT).

Caveat : AUTOLOAD is a global feature, so MySchema->Autoload(1) actually turns it on for all schemas.

AutoUpdateColumns

  MySchema->AutoUpdateColumns( columnName1 => sub{...}, ... );
  MyTable ->AutoUpdateColumns( columnName1 => sub{...}, ... );

Declares handler code that will automatically fill column names columnName1, etc. at each update, either for a single table, or (if declared at the Schema level), for every table. For example, each record could remember who did the last modification with something like

  MySchema->AutoUpdateColumns( last_modif => 
    sub{$ENV{REMOTE_USER} . ", " . localtime}
  );

The handler code will be called as

  $handler->($record, $table)

so that it can know something about its calling context. In most cases, however, the handler will not need these parameters, because it just returns global information such as current user or current date/time.

NoUpdateColumns

  MySchema->NoUpdateColumns(@columns);
  MyTable ->NoUpdateColumns(@columns);

Defines an array of column names that will be excluded from INSERT/UPDATE statements. This is useful for example when some column are set up automatically by the database (like automatic time stamps or user identification). It can also be useful if you want to temporarily add information to memory objects, without passing it back to the database.

NoUpdate columns can be set for a whole Schema, or for a specific Table class.

AutoExpand

  Table->AutoExpand(qw/role1 role2 .../)

Generates an autoExpand method for the class, that will autoexpand on the roles listed (i.e. will call the appropriate method and store the result in a local slot within the object). In other words, the object knows how to expand itself, fetching information from associated tables, in order to build a data tree in memory.

Be careful to avoid loops when specify autoexpands, otherwise you will generate an infinite tree and break your program. For example this would be problematic :

  Employee->Autoexpand(qw/activities/);
  Activity->Autoexpand(qw/employee/);

Schema runtime properties or parameterization methods

dbh

  Schema->dbh(DBI::connect(...) ); # set
  my $dbh = Schema->dbh;           # get

Returns or sets the handle to a DBI database handle (see DBI). This handle is schema-specific. DBIx::DataModel expects the handle to be opened with RaiseError => 1 (see "Transactions and error handling").

schema

  $schema = TableOrView->schema;
  $schema = $myObj->schema;

Returns the name of the schema class for the current object or current Table or View class.

db_table

  $db_table = TableOrView->db_table;
  $db_table = $obj->db_table;

Returns the database table name registered via Schema->Table(..) or collection of joined tables registered via Schema->View(..).

debug

  Schema->debug( 1 );            # will warn for each SQL statement
  Schema->debug( $debugObject ); # will call $debugObject->debug($sql)
  Schema->debug( 0 );            # turn off debugging

Debug mode is useful for seeing SQL statements generated by DBIx::DataModel. Enabling debugging with a $debugObject will typically be useful in conjunction with something like Log::Log4perl or Log::Dispatch.

There is also another way to see the SQL code :

  my ($sql, @bind) = $myClassOrObj->$someSelectMethod(-columns  => \@columns,
                                                      -where    => \%criteria,
                                                      -resultAs => 'sql');

classData

  my $val = $someClass ->classData->{someKey};
  my $val = $someObject->classData->{someKey};

Returns a ref to a hash for storing class-specific data. Each subclass has its own hashref, so class data is NOT propagated along the inheritance tree. Class data should be mainly for reading; if you write into it, make sure you know what you are doing.

primKey

  my @primKey = Table->primKey;

Returns the list of primary keys registered via Schema->Table(..).

noUpdateColumns

  my @cols = MySchema->noUpdateColumns;

Returns the array of column names declared as noUpdate through "NoUpdateColumns".

  my @cols = $obj->noUpdateColumns;

Returns the array of column names declared as noUpdate, either in the Schema or in the Table class of the invocant.

autoUpdateColumns

  my @cols = MySchema->autoUpdateColumns;

Returns the array of column names declared as autoUpdate through "AutoUpdateColumns".

  my @cols = $obj->autoUpdateColumns;

Returns the array of column names declared as autoUpdate, either in the Schema or in the Table class of the invocant.

selectImplicitlyFor

  MySchema->selectImplicitlyFor('read only'); # will be added to SQL selects
  MySchema->selectImplicitlyFor('');          # turn it off

Gets or sets a default value for the -for argument to select(). Here it is set at the Schema level, so it will be applied to all tables.

  TableOrView->selectImplicitlyFor('read only');
  TableOrView->selectImplicitlyFor('');         

Same thing, but at a Table or View level.

  my $string = $obj->selectImplicitlyFor;   

Retrieves whatever whas set in the table or in the schema.

tables

   my @tables = MySchema->tables;

Returns an array of names of Table subclasses declared in this schema.

views

   my @views = MySchema->views;

Returns an array of names of View subclasses declared in this schema.

keepLasth

  MySchema->keepLasth(1); # schema will keep a reference to the last DBI handle
  MySchema->keepLasth(0); # turn it off

If true, the schema will keep a copy of the last generated SQL statement handle (either select, insert or update). The handle may then be accessed through $obj->schema->lasth. This may be useful if you need to interact with the handle for driver-specific operations. However, it also means that the handle is not DESTROYed immediately, which might result in resources being locked until the next statement. Therefore keepLasth is off by default.

lasth

  my $sth = MySchema->lasth;

Returns the last DBI statement handle created by this module, if keepLasth is turned on.

Data retrieval and manipulation methods

doTransaction

  my $coderef = sub {Table1->insert(...); Table2->update(...); ...};
  MySchema->doTransaction($coderef);

Evaluates the code within a transaction. In case of failure, the transaction is rolled back, and an exception is raised with the error message and the status of the rollback (because the rollback itself may also fail).

Usually the coderef passed as argument will be a closure that may refer to variables local to the environment where it was created.

fetch

  my $record = MyTable->fetch(@keyValues, \%options);

Searches the single record whose primary key is @keyValues. Returns undef if none is found. The optional \%options argument can specify things like -for, -preExec, -postExec (see the select() method for an explanation). Of course it makes no sense to specify -where in the options.

select

  my $records = TableOrView->select(\@columns, \%where, \@order);
  
  my $records = TableOrView->select(-columns  => \@columns, 
                                    # OR : -distinct => \@columns,
                                    -where    => \%where, 
                                    -groupBy  => \@groupings,
                                    -having   => \%criteria,
                                    -orderBy  => \@order,
                                    -for      => 'read only',
                                    -preExec  => \&preExec_callback,
                                    -postExec => \&preExec_callback,
                                    -resultAs => 'rows' || 'sth' || 
                                                 'sql'  || 'iterator');
  
  my $wholeTable = Table->select();

Applies a SQL SELECT to the associated table (or view), and returns a result as specified by the -resultAs argument (see below). Arguments are all optional and may be passed either by name or by position (but you cannnot combine both positional and named arguments in a single call).

The API is mostly borrowed from SQL::Abstract :

  • the first argument \@columns is a reference to a list of SQL column specifications (i.e. column names, function or grouping operators, "AS" clauses, etc.). Actually, the argument can also be a string instead of an arrayref, like for example "c1 AS foobar, MAX(c2) AS m_c2, COUNT(c3) as n_c3". If omitted, \@columns takes the default, which is usually '*', unless modified through DefaultColumns().

    No verification is done on the list of retrieved \@columns, so it is OK if the list does not contain the primary or foreign keys --- but then later attempts to perform joins or updates will obviously fail.

  • the second argument \%where is a reference to a hash or array of criteria that will be translated into SQL clauses. In most cases, this will just be something like {col1 => 'val1', col2 => 'val2'}; see SQL::Abstract::select for detailed description of the structure of that hash or array. It can also be a plain SQL string like "col1 IN (3, 5, 7, 11) OR col2 IS NOT NULL".

  • the third argument \@order is a reference to a list of columns for sorting. Again it can also be a plain SQL string like "col1 DESC, col3, col2 DESC".

If using named arguments, more options are available :

-distinct => \@columns

behaves like the -columns arguments, except that keyword DISTINCT will be included in the generated SQL.

-groupBy => "string" or -groupBy => \@array

adds a GROUP BY clause in the SQL statement. Grouping columns are specified either by a plain string or by an array of strings.

-having => "string" or -having => \%criteria

adds a HAVING clause in the SQL statement (only makes sense together with a GROUP BY clause). This is like a -where clause, except that the criteria are applied after grouping has occured.

-for => $clause

specifies an additional clause to be added at the end of the SQL statement, like -for => 'read only' or -for => 'update'.

-preExec => \&preExec_callback, -postExec => \&postExec_callback

hooks for specifying callback functions to be called on the DBI statement handle, just before or just after invoking execute(). So the sequence will be more or less like this:

  $sth = $dbh->prepare($sql_statement);
  $preExec_callback->($sth)  if $preExec_callback;
  $sth->execute(@bind_values);
  $postExec_callback->($sth) if $postExec_callback;

This is mostly useful if you need to call driver-specific functions at those stages.

-resultAs => $result_kind

specifies what kind of result will be produced. Possible result kinds are :

rows

The result will be a ref to an array of rows, blessed into objects of the class. This is the default result kind. If there are no data rows, a ref to an empty list is returned.

iterator

The result is a simple iterator object with a unique method: next, that fetches the next datarow and blesses it into the appropriate object. The method returns undef when there is no more data to fetch. So a typical usage pattern is :

  my $iterator = $class->select(-where    => \%criteria, 
                                -resultAs => 'iterator');
  while (my $row = $iterator->next) {
    do_something_with($row);
  }
sth or statement

The result will be an executed DBI statement handle. Then it is up to the caller to retrieve data rows using the DBI API. If needed, these rows can then be blessed into appropriate objects through blessFromDB().

sql

In scalar context, the result will just be the generated SQL statement. In list context, it will be ($sql, @bind), i.e. the SQL statement together with the bind values.

insert

  my @ids = MyTable->insert({col1 => $val1, col2 => $val2, ...}, {...});

Applies the toDB handlers, removes the noUpdate columns, and then inserts the new records into the database. Because of the handlers, this operation may modify the argument data, so it is not safe to access $val1, $val2, etc. after the call.

The return value is the list of ids collected by calling DBI's last_insert_id() after each record insertion. This may or may not be meaningful, depending on your database driver --- see DBI's documentation. Caveat: currently we only pass undef arguments to last_insert_id(), which is good enough for MySQL, but not for some other databases.

update

  MyTable->update({col1 => $val1, ...});
  MyTable->update(@primKey, {col1 => $val1, ...});
  $obj->update;

This is both a class and an instance method. It applies the toDB handlers, removes the noUpdate columns, and then updates the database for the given record.

When called as a class method, the columns and values to update are supplied as a hashref. The second syntax with @primKey is an alternate way to supply the values for the primary key (it may be more convenient because you don't need to repeat the name of primary key columns). So if emp_id is the primary key of table Employee, then the following are equivalent :

  Employee->update({emp_id => $eid, address => $newAddr, phone => $newPhone});
  Employee->update($eid => {address => $newAddr, phone => $newPhone});

When called as an instance method, i.e.

  $someEmployee->update;

the columns and values to update are taken from the object in memory (ignoring all non-scalar values). After the update, the memory for that object is emptied (to prevent any confusion, because the 'toDB' handlers might have changed the values). So to continue working with the same record, you must fetch it again from the database (or clone it yourself before calling update).

In either case, you have no control over the 'where' clause of the SQL update statement, that will be based exclusively on primary key columns. So if you need to simultaneously update several records with a SQL request like

  UPDATE myTable SET col='newVal' WHERE otherCol like 'criteria%'

then you should generate the SQL yourself and pass it directly to DBI->do($sql).

The update method only updates the columns received as arguments : it knows nothing about other columns that may sit in the database table. Therefore if you have two concurrent clients doing

  (client1)  MyTable->update($id, {c1 => $v1, c2 => $v2});
  (client2)  MyTable->update($id, {c3 => $v3, c4 => $v4, c5 => $v5});

the final state of record $id in the database will reflect changes from both clients.

delete

  MyTable->delete({column1 => value1, ...});
  MyTable->delete(@primKey);
  $record->delete;

This is both a class and an instance method. It deletes a record from the database.

When called as a class method, the primary key of the record to delete is supplied either as a hashref, or directly as a list of values. Note that MyTable->delete(11, 22) does not mean "delete records with keys 11 and 22", but rather "delete record having primary key (11, 22)"; in other words, you only delete one record at a time. In order to simultaneously delete several records, you must generate the SQL yourself and go directly to the DBI level.

When called as an instance method, the primary key is taken from object columns in memory. After the delete, the memory for that object is destroyed.

applyColumnHandler

  TableOrView->applyColumnHandler($handlerName, \@objects);
  $myObject  ->applyColumnHandler($handlerName);

Inspects the target object or list of objects; for every column that exists in the object, checks whether a handler named $handlerName was declared for that column (see method "ColumnHandlers"), and if so, calls the handler. By this definition, if a column is absent in an object, then the handler for that column is not called, even though it was declared in the class.

The results of handler calls are collected into a hashref, with an entry for each column name. The value of each entry depends on how applyColumnHandlers was called : if it was called as an instance method, then the result is something of shape

  {columnName1 => resultValue1, columnName2 => resultValue2, ... }

if it was called as a class method (i.e. if \@objects is defined), then the result is something of shape

  {columnName1 => [resultValue1forObject1, resultValue1forObject2, ...], 
   columnName2 => [resultValue2forObject1, resultValue2forObject2, ...], 
   ... }

If columnName is not present in the target object(s), then the corresponding result value is undef.

hasInvalidColumns

  my $invalid_columns = $obj->hasInvalidColumns;
  if ($invalid_columns) {
    print "wrong data in columns ", join(", ", @$invalid_columns);
  }
  else {
   print "all columns OK";
  }

Applies the 'validate' handler to all existent columns. Returns a ref to the list of invalid columns, or undef if there are none.

Note that this is validation at the column level, not at the record level. As a result, your validation handlers can check if an existent column is empty, but cannot check if a column is missing (because in that case the handler would not be called).

Your 'validate' handlers, defined through "ColumnHandlers", should return 0 or an empty string whenever the column value is invalid. Never return undef, because we would no longer be able to distinguish between an invalid existent column and a missing column.

expand

  $obj->expand($role [, @args] )

Executes the method $role to follow an Association, stores the result in the object itself under $obj->{$role}, and returns that result. This is typically used to expand an object into a tree datastructure. Optional @args are passed to $obj->$role(@args), for example for specifying -where, -columns or -orderBy options.

After the expansion, further calls to $obj->$role (without any arguments) will reuse that same expanded result instead of calling again the database. This caching improves efficiency, but also introduces the risk of side-effects across your code : after

  $obj->expand(someRole => (-columns => [qw/just some columns/],
                            -where   => [someField => 'restriction']))

further calls to $obj->someRole() will just return a dataset restricted according to the above criteria, instead of a full join. To prevent that effect, you would need to delete $obj->{someRole}, or to call the role with arguments : $obj->someRole('*').

autoExpand

  $record->autoExpand( $recurse );

Asks the object to expand itself with some objects in foreign tables. Does nothing by default, should be redefined in subclasses, most probably through the "AutoExpand" method (with capital 'A'). If the optional argument $recurse is true, then autoExpand is recursively called on the expanded objects.

blessFromDB

  MyTable->blessFromDB($record);
  MyView ->blessFromDB($record);

Blesses $record into an object of the class, and applies the fromDB column handlers.

preselectWhere

  my $meth = TableOrView->preselectWhere({col1 => $val1, ...}, $multiplicity);
  
  # .. later
  my $result = $meth->(-where =>{otherCol => 'otherVal'}, -columns => \@cols);

Returns a reference to a function that will select data from MyTable, just like the select() method, but where some additional selection criteria are "preselected". The preselection criteria are specified in SQL::Abstract format. This method is mainly for internal use; you only want to learn about it if you intend to write your own role methods; an example is shown in section "Self-referential associations" below.

If the optional $multiplicity argument contains 1 or '0..1', then the function behaves like fetch() rather than select()|/select: that is, the result of $meth->(...) will be a single recordref, rather than an arrayref of records.

OTHER CONSIDERATIONS

Namespaces, classes, methods

DBIx::DataModel automatically generates Perl classes for Schemas, Tables, Views, Associations. Before doing so, it checks that no Perl package of the same name already exists. A similar check is performed before adding role methods into classes.

The client code can insert additional methods into the generated classes : just switch to the package and define your code. However, because of the security checks just mentioned, DBIx::DataModel must create the package before you start adding methods to it, and therefore the declarations should be inside a BEGIN block :

  BEGIN { # make sure these declarations are immediately executed
  
    DBIx::DataModel->Schema('MySchema'); 
    MySchema->Table(Activity   => Activity   => qw/act_id/);
    ...
  }
  
  # now we can safely add new methods
  
  package Activity; 
  
  sub activePeriod {
    my $self = shift;
    $self->{d_end} ? "from $self->{d_begin} to $self->{d_end}"
                   : "since $self->{d_begin}";
  }
  
  package main;                 # switch back to the 'main' package

See perlmod for an explanation of BEGIN blocks.

Interaction with the DBI layer

Transactions and error handling

DBIx::DataModel follows the recommendations of DBI for transactions : it expects the database handle to be opened with RaiseError => 1 and therefore does not check itself for DBI errors ; it is up to the client code to catch the exceptions and deal with errors.

As explained in "Transactions" in DBI, AutoCommit should be set off for databases that support transactions; then atomic operations are enclosed in an eval, followed by either $dbh->commit() (in case of success) or $dbh->rollback() (in case of failure). The doTransaction() does all this for you automatically.

Calling DBI directly

Maybe you will encounter situations where you need to generate SQL yourself (for example because of clauses specific to your RDBMS), or to interact directly with the DBI layer. This can be encapsulated in additional methods incorporated into the classes generated by DBIx::DataModel. In those methods, you may want to call blessFromDB() so that the rows returned by DBI may be seen as objects from your client program. Here is an example :

  package MyTable; # switch to namespace 'MyTable'

  sub fancyMethod {

    # call the DBI API
    my $hash = $dbh->selectall_hashref($fancySQL, @keyFields);

    # bless results into objects of MyTable
    MyTable->blessFromDB($_) foreach values %$hash;

    return $hash;
  }

Self-referential associations

Associations can be self-referential, i.e. describing tree structures :

  MySchema->Association([qw/OrganisationalUnit parent   1 ou_id/],
                        [qw/OrganisationalUnit children * parent_ou_id/],

However, when there are several self-referential associations, we might get into problems : consider

  MySchema->Association([qw/Person mother   1 pers_id/],
                        [qw/Person children * mother_id/]);
  MySchema->Association([qw/Person father   1 pers_id/],
                        [qw/Person children * father_id/]); # BUG

This does not work because there are two definitions of the "children" role name in the same class "Person". One solution is to distinguish these roles, and then write by hand a general "children" role :

  MySchema->Association([qw/Person mother         1 pers_id/],
                        [qw/Person motherChildren * mother_id/]);
  MySchema->Association([qw/Person father         1 pers_id/],
                        [qw/Person fatherChildren * father_id/]);
  
  package Person;
  sub children {
    my $self = shift;
    my $id = $self->{pers_id};
    my $sql = "SELECT * FROM Person WHERE mother_id = $id OR father_id = $id";
    my $children = $self->dbh->selectall_arrayref($sql, {Slice => {}});
    Person->blessFromDB($_) foreach @$children;
    return $children;
  }

Alternatively, since rolenames motherChildren and fatherChildren are most probably useless, we might just specify unidirectional associations :

  MySchema->Association([qw/Person mother  1 pers_id/],
                        [qw/Person none    * mother_id/]);
  MySchema->Association([qw/Person father  1 pers_id/],
                        [qw/Person none    * father_id/]);

And here is a more sophisticated way to define the "children" method, that will accept additional "where" criteria, like every regular method.

  package Person;
  sub children {
    my $self        = shift; # remaining args in @_ will be passed to select()
    my $id          = $self->{pers_id};
    my $select_func = Person->preselectWhere([mother_id => $id, 
                                              father_id => $id]);
    return $select_func->(@_);
  }

This definition forces the join on mother_id or father_id, while leaving open the possibility for the caller to specify additional criteria. For example, all female children of a person (either father or mother) can now be retrieved through

  $person->children({gender => 'F'})

Observe that mother_id and father_id are inside an arrayref instead of a hashref, so that SQL::Abstract will generate an SQL 'OR'.

INTERNALS

This section documents some details that normally should not be relevant to clients; you only want to read about them if you intend to extend the framework.

Class hierarchy

The following picture shows the hierarchy of implementation classes :

             +-----------------------+
             | DBIx::DataModel::Base |
             +-----------------------+
                   /                \
                  /                  \
   +-------------------------+   +--------------------------------+
   | DBIx::DataModel::Schema |   | DBIx::DataModel::AbstractTable |
   +-------------------------+   +--------------------------------+
                                       /                   \
                                      /                     \
                       +------------------------+  +-----------------------+
                       | DBIx::DataModel::Table |  | DBIx::DataModel::View |
                       +------------------------+  +-----------------------+


    +-----------------+    +---------------------------+
    | DBIx::DataModel |    | DBIx::DataModel::Iterator |
    +-----------------+    +---------------------------+

The "Schema" method creates a subclass of DBIx::DataModel::Schema. The "Table" method creates a subclass of DBIx::DataModel::Table. The "View" method and its related clients ("ViewFromRoles", "selectFromRoles", etc.) use multiple inheritance : views inherit first from DBIx::DataModel::View, but also from the supplied list of parent tables. As a result, instances of such views can exploit all role methods of their parent tables. The entry class DBIx::DataModel is just a façade interface to DBIx::DataModel::Schema. The helper class DBIx::DataModel::Iterator implements iterators returned by the "select" method.

Private methods

_setClassData

  DBIx::DataModel::Base->_setClassData($subclass, $data_ref);

_createPackage

  DBIx::DataModel::Schema->_createPackage($pckName, $isa_arrayref);

Creates a new Perl package of name $pckName that inherits from @$isa_arrayref. Raises an exception if the package name already exists.

_defineMethod

  DBIx::DataModel::Schema->_defineMethod($pckName, $methName, $coderef);

Defines a new method in package $pckName, bound to $coderef; or undefines a method if $coderef is undef. Raises an exception if the method name already exists in that package.

SEE ALSO

Some alternative modules in this area are Class::DBI, DBIx::Class, Alzabo, Tangram, Rose::DB::Object, Data::ObjectDriver, ORM, SPOPS, Class::PObject, , DBIx::RecordSet, DBIx::SQLEngine,DBIx::Record, , and a lot more in the DBIx::* namespace, all with different approaches. For various reasons, none of these did fit nicely in my context, so I decided to write DBIx:DataModel. Of course there might be also many reasons why DBIx:DataModel will not fit in your context, so just do your own shopping. A good place to start would be the general discussion on RDBMS - Perl mappings at http://poop.sourceforge.net. There are also some pointers in the Perl 5 Enterprise Environment website at http://www.officevision.com/pub/p5ee/.

For discussions about DBIx::DataModel, please use the CPAN::Forum site at http://www.cpanforum.com/dist/DBIx-DataModel.

TO DO

  - autoloader to get tables and associations from $dbh->table_info, etc.
  - 'hasInvalidColumns' : should be called automatically before insert/update ?
  - 'validate' record handler (not only column handlers)
  - 'normalize' handler : for ex. transform empty string into null
  - walk through WHERE queries and apply 'toDB' handler (not obvious!)
  - add the UML notions of Aggregation and Composition (would mean 
    additional methods for adding and removing parts of an aggregate;
    automatic deletion of composite parts)
  - decide what to do with multiple inheritance of role methods in Views;
    use NEXT ?
  - implement table aliases
  - pass proper parameters to last_insert_id(..) in insert()
  - maybe it is not a good idea to modify data in place when 
    performing inserts or updates; should perhaps clone the arguments.
  - more extensive and more organized testing
  - optional caching for fetch() within lookup tables
  - 

AUTHOR

Laurent Dami, <laurent.dami AT etat geneve ch>

COPYRIGHT AND LICENSE

Copyright 2006 by Laurent Dami.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 2292:

Non-ASCII character seen before =encoding in 'façade'. Assuming CP1252