The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

DBIx::DataModel::Doc::Design - Architecture and design principles

DOCUMENTATION CONTEXT

WARNING : Version 2 of DBIx::DataModel is a major refactoring from versions 1.*, with a number of incompatible changes in the API (classes renamed, arguments renamed or reorganized, etc. -- see DBIx::DataModel::Doc::Delta_2.0). The documentation below has not been fully updated yet to reflect these changes.

This chapter is part of the DBIx::DataModel manual.

This chapter covers the basic architecture of DBIx::DataModel, and the main motivating principles for proposing yet another ORM. Read it if you are currently evaluating whether DBIx::DataModel is suitable for your context, or if you want to globally understand how it works. Skip it and jump to the QUICKSTART chapter if you want to directly start using the framework.

GENERAL ARCHITECTURE

Schema definitions centralized in one single module

Table and association definitions for DBIx::DataModel are centralized in one single file; this is in contrast with most other Perl ORM frameworks, where each table has its own .pm file, and association declarations are scattered among such files.

Nevertheless, DBIx::DataModel does create a Perl package for each table; but such packages are directly added into the interpreter's symbol table, whenever a Table() declarations is met : hence there is no need for a specific file for each table. However, the client code can add its own methods into these packages, using Perl's package directive to switch into the appropriate namespace.

Classes

The following picture shows the main classes in DBIx::DataModel class hierarchy :

         FRAMEWORK CLASSES
         *****************

   +=================+               +============================+
   | DBIx::DataModel |               | DBIx::DataModel::Statement |
   +=================+               +============================+

   +=========================+   +=========================+
   | DBIx::DataModel::Schema |   | DBIx::DataModel::Source |
   +==,======================+   +===,=================.===+
      |                             /                   \
      |                            /                     \
      |     +=====================^==+  +=================^=====+
      |     | DBIx::DataModel::Table |  | DBIx::DataModel::Join |
      |     +=================,======+  +====================,==+
      |                       |                             /
  ====|=======================|============================/=======
      |  APPLICATION CLASSES  |                           /
      |  *******************  |                          /
      |                       |                         /
  +===^======+     +==========^========+               /
  | MySchema |     | MySchema::Table_n |==+           /
  +==========+     +==+=========.======+  |==+       /
                      +==+=======\==.=====+  |      /
                         +========\==\==.====+     /
                                   \  \  \        /
                                  +=^==^==^======^========+
                                  | MySchema::AutoJoin::* +==+
                                  +==+====================+  |==+
                                     +==+====================+  |
                                        +=======================+

The top half of the picture represents basic classes distributed with DBIx::DataModel. The bottom half represents application-specific subclasses, built through class definition methods (see below). Most objects created during the lifetime of the application will be either instances of those application-specific subclasses, or instances of the DBIx::DataModel::Statement class.

The entry class DBIx::DataModel is just a façade interface to DBIx::DataModel::Schema.

Subclasses of DBIx::DataModel::Schema are created by methods Schema() or define_schema() in DBIx::DataModel; in most cases only one such subclass will be needed, unless the application talks to several databases.

Subclasses of DBIx::DataModel::Source::Table represent tables in the database and are created by methods Table() or define_table(). Calls to Association() or define_association() automatically add new methods into these table classes, so that from a row of one table it is possible to reach related rows in associated tables.

Subclasses of DBIx::DataModel::Source::Join represent database queries that join several tables. They are created indirectly through calls to the define_join() method. Join subclasses use multiple inheritance : they inherit first from DBIx::DataModel::Source::Join, but also from tables involved in the database join. As a result, instances of such joins can invoke all methods of their parent tables.

The helper class DBIx::DataModel::Statement encapsulates a single database request.

Instances

A schema instance holds a connexion to a database. Most often there is only one schema instance (single-schema mode); but if necessary the application can switch to to multi-schema mode with several schema instances.

Data rows retrieved from the database are encapsulated as instances of the application-specific Table and Join subclasses. They possess methods for navigating through the associations in the database and retrieve related rows, and methods to modify the data.

A request to the database is encapsulated as an instance of DBIx::DataModel::Statement. This instance has methods for preparing the SQL query, binding parameters to it, executing the query, and retrieving the resulting data rows. Statement instances are usually short-lived and confined to specific internal parts of the application, while data rows (instances of tables or joins) are usually transmitted to the presentation layers of the application, in order to use the data within reports, forms, etc.

Data rows know from which source they were created, because they are blessed into corresponding table or join classses. They also know from which schema they were queried, either implicitly (in single-schema mode), or explicitly with a special __schema field (in multi-schema mode); but they do not know from which statement they were queried.

The following picture shows relationships between classes and instances :

             FRAMEWORK CLASSES       +============================+
             *****************       | DBIx::DataModel::Statement |
                                     +========================,===+
                                                              |
  ============================================================|=====
             APPLICATION CLASSES                              |
             *******************                              |
                                                              |
  +==========+     +===================+                      |
  | MySchema |     | MySchema::Table_n |=+                    |
  +==========+     +==+================+ |=+                  |
       |              +==+===============+ |                  |
       |                 +===,=============+                  |
       |                     |                                |
       |                     |    +=====================+     |
       |                     |    | MySchema::AutoJoin  +=+   |
       |                     |    +==+==================+ |   |
       |                     |       +=,==================+   |
       |                     |         |                      |
  =====|=====================|=========|======================|=====
       |     INSTANCES       |         |                      |
       |     =========       |         |                      |
   +--------+            +-----+    +-----+  next()  +-----------+
   | schema |            | row |    | row |<==<==<===| statement |
   +--------+            +-----+    +-----+          +-----------+

Meta-classes

Each application subclass has a metaclass, i.e. an instance of a class in the DBIx::DataModel::Meta namespace. This meta-object is accessible from the class through the metadm() class method. Conversely, metaclasses have a class() method to retrieve the application subclass to which they are bound.

Metaclasses hold information about application classes, and implement some of their methods. In most cases, this is totally transparent to end users; however, users can interact with metaclasses to get some information about the available tables, associations, etc., or even to change their behaviour. The picture below shows relationships between application classes and the meta-objects to which they are related (classes start with capital letters, instances start with lowercase letters).

          FRAMEWORK CLASSES
          *****************
  +==============+  +=============+  +============+  +============+
  | Meta::Schema |  | Meta::Table |  | Meta::Join |  | Meta::Path |
  +====,=========+  +==========,==+  +=====,======+  +,===========+
       |                       |           |          |  +===================+
       |                       |           |          |  | Meta::Association |
       |                       |           |          |  +=======,===========+
   ====|=======================|===========|==========|==========|==
       |  APPLICATION CLASSES  |           |          |          |
       |  AND META-OBJECTS     |           |          |          |
       |  *******************  |           |          |          |
  +----^------+   +========+   |           |          |  +-------^----------+
  |meta_schema|---|MySchema|   |           |          |  | meta_association |
  +---------x-+   +========+   |           |          |  +----x-------------+
             \_________________|___________|__________|_____ /
                          \    |           |          |     /
   +=================+   +-x---^------+    |     +----^----x-+
   | MySchema::Table |---| meta_table |----|-----| meta_path |
   +=================+   +----------x-+    |     +x----------+
                                     \     |     /
                                      \    |    /
         +=======================+   +-x---^---x-+
         | MySchema::AutoJoin::* +---| meta_join |
         +=======================+   +-----------+

Polymorphic methods

Some methods like join() or select() are heavily polymorphic, in the sense that they can be applied to various kinds of objects, with various kinds of arguments, and can return various kinds of results. Polymorphism in this way is not exactly common object-oriented practice, but it has been intentionally designed as such, in a attempt to "do the right thing" in different situations, while hiding inner details from users. This is similar in sprit to the the "do what I mean" (DWIM) principle of Perl design, where similar constructs may mean different things depending on the context.

Polymorphic join()

The join() method, when applied to a Schema, calls define_join() to define or retrieve a join subclass; then it creates a fresh statement related to that join datasource and to the schema. From that statement, data can then be retrieved through the select().

When applied to a Table or Join subclass, join() is a class method that creates a statement to query one or several associated tables. The statement contains a condition to restrict the results according to the primary key of the initial table. That statement cannot be executed yet, because the values of the primary key are not known until we have an instance of the initial table; but the statement can already be prepared. Later on, we can bind the statement to an instance of the initial table, and then execute it.

When applied to a data row (to an instance of a Table or Join subclass), join() is an instance method that works like the class method above (i.e. it creates a statement), but in addition values of the current object are bound to the appropriated placeholders within the query. This is illustrated below in the section about statements.

So in all of these cases, join() is meant to produce a statement from which one can later select in order to retrieve data rows.

Polymorphic select()

The select() method is defined in the DBIx::DataModel::Statement class. It performs many different things, depending on what kind of result is requested through the -result_as parameter, and also depending on the current state of the statement. More details about statements are provided below in the "STATEMENT OBJECTS" section.

Another select() method is defined on data sources (i.e. subclasses of DBIx::DataModel::Source), but only works when in single-schema mode : it uses the singleton() method to get the current schema, creates a statement on that schema and that datasource, and then delegates the select() call to that statement.

ASSOCIATIONS

Definition syntax

Front-end method : Association()

Associations are expressed in a syntax designed to closely reflect how they would be pictured in a Unified Modelling Language (UML) diagram. The general form, using the front-end Association() method, is :

  $schema->Association([$class1, $role1, $multiplicity1, @columns1],
                       [$class2, $role2, $multiplicity2, @columns2]);

Let's consider a particular example :

  $schema->Association([qw/Department department 1 /],
                       [qw/Activity   activities * /]);

which corresponds to UML diagram

  +------------+                         +------------+
  |            | 1                  0..* |            |
  | Department +-------------------------+ Activities |
  |            | department   activities |            |
  +------------+                         +------------+

This states that there is an association between classes MySchema::Department and MySchema::Activity, with the corresponding role names (roles are used to navigate through the association in both directions), and with the corresponding multiplicities (here an activity corresponds to exactly one employee, while an employee may have many activities).

In the UML specification, role names and multiplicities are optional, and indeed are often omitted, because they are taken as implicit from the context. Here, both role names and multiplicities are mandatory, because they are needed for driving code generation.

The association declaration is bidirectional, so it will simultaneously add features in both participating classes.

In order to generate the appropriate SQL join statements, the framework needs to know the join column names on both sides; these can be either given explicitly in the declaration, or they are guessed from the primary key of the table with multiplicity 1.

Role names declared in the association are used for a number of purposes : implementing methods for direct navigation; implementing methods for inserting new members into owner objects; and implementing multi-step navigation paths through several assocations, such as in :

   $department->join(qw/activities employee spouse/)
              ->select(-columns => \@some_columns,
                       -where   => \%some_criteria);

Information known by the meta-schema about the associations will be used to automatically generate the appropriate database joins. The kinds of joins (INNER JOIN, LEFT OUTER JOIN) are inferred from the multiplicities declared in the association. These can also be explicitly overridden by writing

   ...->join(qw/activities <=> employee <=> spouse/) # inner joins

   ...->join(qw/activities  => employee  => spouse/) # left joins

If referential integrity rules are declared within the RDBMS, then there is some overlap with what is declared here on the Perl side. However, it would not be possible to automatically deduce all association information from database metadata, because the database does not know about role names and multiplicities. A partial schema can be automatically generated using DBIx::DataModel::Schema::Generator, but it usually needs some manual additions to be really useful.

Back-end method : define_association()

Associations can also be declared through the back-end method define_association(). The principle is exactly the same, but the back-end method uses named parameters instead of positional parameters : therefore it is more verbose, but also more explicit and more flexible.

With define_association(), our example above would be written

  my $meta_schema = $schema->metadm;
  $meta_schema->define_association(
    name => 'Department_activity', # or whatever name you prefer
    kind => 'Association',
    A    => {
      table        => $meta_schema->table('Department'),
      role         => 'department',
      multiplicity => [1, 1],
      join_cols    => [qw/dpt_id/],
    },
    B    => {
      table        => $meta_schema->table('Activity'),
      role         => 'activities',
      multiplicity => [0, '*'],
      join_cols    => [qw/dpt_id/],
    },
  );

Technical consequences of an association definition

This section describes what happens when a new association is defined.

New instance of Meta::Association

A new instance of DBIx::DataModel::Meta::Association is created, mainly for providing support for reflection queries (clients asking what are are the tables and associations in the current schema).

New instances of Meta::Path

Two instances of DBIx::DataModel::Meta::Path are created and registered into their corresponding meta_table objects. Such paths will be queried for resolving joins, i.e. a method call like

  $schema->join(qw/Table path1 path2 .../)

walks through the paths to decide which tables and which join conditions are involved.

New path methods within table classes

As a result of the association declaration, the Perl class $table1 will get an additional method named $role2 for accessing the associated object(s) in $table2; that method normally returns an arrayref, unless $multiplicity2 has maximum '1' (in that case the return value is a single object ref). Of course, $table2 conversely gets a method named $role1.

To understand why tables and roles are crossed, look at the UML picture :

  +--------+                     +--------+
  |        | *              0..1 |        |
  | Table1 +---------------------+ Table2 |
  |        | role1         role2 |        |
  +--------+                     +--------+

so from an object of Table1, you need a method role2 to access the associated object of Table2. In your diagrams, be careful to get the role names correctly according to the UML specification. Sometimes we see UML diagrams where role names are on the wrong side, mainly because modelers have a background in Entity-Relationship or Merise methods, where it is the other way around.

Role methods perform joins within Perl (as opposed to joins directly performed within the database). That is, given a declaration

  $schema->Association([qw/Employee   employee   1   /],
                       [qw/Activity   activities 0..*/]);

we can call

  my activities = $an_employee->activities

which will implicitly perform a

  SELECT * FROM Activity WHERE emp_id = $an_employee->{emp_id}

The role method can also accept additional parameters in SQL::Abstract::More format, exactly like the select() method. So for example

  my $activities = $an_employee->activities(-columns => [qw/act_name salary/],
                                            -where   => {is_active => 'Y'});

would perform the following SQL request :

  SELECT act_name, salary FROM Activity WHERE
    emp_id = $anEmployee->{emp_id} AND
    is_active = 'Y'

If the role method is called without any parameters, and if that role was previously expanded (see expand() method), i.e. if the object hash contains an entry $obj->{$role}, then this data is reused instead of calling the database again (this is a primitive form of caching). To force a new call to the database, supply some parameters :

  $emp->expand('activities');  # stores result in $emp->{activities}
  $list = $emp->activities;    # returns cached $emp->{activities}
  $list = $emp->activities(-columns => '*');  # default columns, but
                                              # forces a new call to the DB

Sometimes associations are unidirectional (it does not make sense to traverse the association in both directions). In such cases, use an anonymous role, expressed by an empty string, or by strings "0", '""', "--" or "none" : then the corresponding path method is not generated.

New insert_into$path method

When a role has multiplicity '*', another method named insert_into_... is also installed, that will create new objects of the associated class, taking care of the linking automatically :

  $an_employee->insert_into_activities({d_begin => $today,
                                        dpt_id  => $dpt});

This is equivalent to

  $schema->table('Activity')->insert({d_begin => $today,
                                      dpt_id  => $dpt,
                                      emp_id  => $an_employee->{emp_id}});

Many-to-many associations

UML conceptual models may contain associations where both roles have multiplicity '*' (so-called many-to-many associations). However, when it comes to actual database implementation, such associations need an intermediate linking table to collect couples of identifiers from both tables.

DBIx::DataModel supports many-to-many associations as a kind of syntactic sugar, translated into low-level associations with the linking table. The linking table needs to be declared first :

  $schema->Table(qw/LinkTable link_table prim_key1 prim_key2/);

  $schema->Association([qw/Table1     role1       0..1/],
                       [qw/LinkTable  link_table  *   /]);

  $schema->Association([qw/Table2     role2       0..1/],
                       [qw/LinkTable  link_table  *   /]);

This describes a diagram like this :

  +--------+                   +-------+                   +--------+
  |        | 0..1            * | Link  | *            0..1 |        |
  | Table1 +-------------------+  --   +-------------------+ Table2 |
  |        | role1  link_table | Table | link_table  role2 |        |
  +--------+                   +-------+                   +--------+

Then we can declare the many-to-many association, very much like ordinary associations, except that the last items in the argument lists are names of paths to follow, instead of names of columns to join. In the diagram above, we must follow paths link_table and role2 in order to obtain the rows of Table2 related to an instance of Table1; so we write

  $schema->Association([qw/Table1  roles1  *  link_table role1/],
                       [qw/Table2  roles2  *  link_table role2/]);

which describes a diagram like this :

              +--------+                    +--------+
              |        | *                * |        |
              | Table1 +--------------------+ Table2 |
              |        | roles1      roles2 |        |
              +--------+                    +--------+

The declaration has created a new method roles2 in Table1; that method is implemented by following paths linksA and role2. So for an object obj1 of Table1, the call

  my $obj2_arrayref = $obj1->roles2();

will generate the following SQL :

  SELECT * FROM link_table INNER JOIN table2
            ON link_table.prim_key2=table2.prim_key2
    WHERE link_table.prim_key1 = $obj->{prim_key1}

Observe that roles2() returns rows from a join, so these rows will belong both to Table2 and to Link_Table.

Many-to-many associations do not have an automatic insert_into_* method : you must explicitly insert into the link table.

In the previous section we were following two roles at once in order to implement a many-to-many association. More generally, it may be useful to follow several roles at once, joining the tables in a single SQL query. This can be done through the following methods :

  • Schema::join() : create a new Join that selects from several tables, filling the joins automatically

  • Table::join() : from a given row object, follow a list of paths to get information from associated tables.

  • Table::join() : from a given class, follow a list of roles to prepare getting information from associated tables; the result is a Statement object that can be bound to specific members of the initial table and then can be selected.

  • define_navigation_method() : add a new method in a table, that will follow a list of roles .

STATEMENT OBJECTS

Purpose

A statement object encapsulates a request to the database. It starts by assembling various pieces of information : the datasource to query, the database connection, the various clauses for generating the SQL request. Once everything is ready, the database request is issued, and the results are collected.

Often, the client code does not even see that a statement object is doing the work : for example in a query like

  my $rows = $schema->table($name)->select(-columns => \@columns
                                           -where   => \%condition);

the result is directly retrieved as an arrayref of rows, and the intermediate statement object for constructing these rows is created and destroyed automatically.

However, in some situations discussed below, it is useful to explicitly interact with the statement object, for controlling various steps through the its lifecycle.

Lifecycle

Principle

The statement object goes through a sequence of states before delivering data rows. Some methods are only available in a given state. At any time, the status() method tells which is the current state of the statement object.

Status values

NEW

The statement has just been created, and already knows its datasource (an instance of DBIx::DataModel::Meta::Source) and its schema (an instance of DBIx::DataModel::Schema). While in this state, the statement can accumulate parameters (-columns to retrieve, -where clauses that will end up in the generated SQL, etc.) through one or several calls to the refine() method. Early parameter binding may also occur through the bind() method.

Statements are rarely created by a direct call to the new() method; instead, they are created indirectly, by calling methods table() or join() on a schema; see "Creating statements" in DBIx::DataModel::Doc::Reference.

SQLIZED

The sqlize() method gathers all parameters accumulated so far within the new statement, generates the SQL, and bumps the statement into SQLIZED state.

At this point, it is no longer possible to call the refine() method to add new clauses. However, it is still possible to call the bind() method to bind values to the placeholders.

PREPARED

The prepare() method calls "prepare" in DBI to get a DBI sth handle, and bumps the statement into PREPARED state.

EXECUTED

The execute() method calls "execute" in DBI to execute the statement on the database side, and bumps the statement into EXECUTED state.

Ath this point, the statement is ready to extract data rows.

State diagram

  +--------+     select()      +--------------------+
  | source |------------------>|    NEW_statement   |<----<-------,
  +--------+                   +--------------------+     |       |
                    sqlize()      |    |   |    refine()  |       |
                   ,----<---------'    |   '------>-------'       |
                   |                   |                 bind(..) |
                   |                   '------------------>-------'
                   |
                   |           +--------------------+
                   '---->----->| SQLIZED_statement  |<----<-------,
                               +--------------------+             |
                    prepare()     |    |                 bind(..) |
                   ,----<---------'    '------------------>-------'
                   |
                   |           +--------------------+
                   '---->----->| PREPARED_statement |<----<-------,
                               +--------------------+             |
                    execute()     |    |                 bind(..) |
                   ,----<---------'    '------------------>-------'
                   |
                   |           +--------------------+
                   '---->----->| EXECUTED_statement |<----<-------,
                               +--------------------+     |       |
                                  |    |   |     bind(..) |       |
                                  |    |   '------>-------'       |
                                  |    |                execute() |
                                  |    '------------------>-------'
  +-------------+  next()/all()   |
  | data row(s) |-------<---------'
  +-------------+

Difference between views and statements

Both views and statements encapsulate SQL SELECT queries, so some clarification is of order.

A view is a subclass of DBIx::DataModel::View, and therefore also a subclass of DBIx::DataModel::Source. Data rows retrieved from that source become instances of the view. The view usually encapsulates a database join, and the table classes corresponding to the joined tables are also parent classes for the view (multiple inheritance). This means that instances of the view inherit all parent methods for manipulating columns, navigating through associations, etc. The view may include a WHERE clause to restrict the database query, but this is very unfrequent : the main purpose of a view is to encapsulate a join.

By contrast, a statement is an instance of DBIx::DataModel::Statement, most frequently without any subclassing, that represents a particular request to a particular data source (a datasource is a subclass of DBIx::DataModel::Source, i.e. either a table or a view), and technically encapsulates a reference to a DBI statement handle (sth). The statement object goes through a lifecycle with following steps : assembling query clauses, generating the SQL, binding values, preparing the database statement, executing the database query, retrieving the results and blessing them into appropriate classes. Statements also have pagination methods to walk through the results in chunks of several data rows.

With respect to the generated SQL, we could say in short that a view represents the FROM clause of the SQL, while a statement represents all other clauses (list of columns, WHERE, ORDER BY, GROUP BY, etc.).

Stepwise building of the SQL query

Principle

A statement object can accumulate requirements in several steps, before generating the actual database query. Therefore it is a collaborative platform where various independent software components can contribute to various parts of the final SQL.

Example

  # create a statement with initial conditions on the department
  my $statement = $department->join(qw/activities employee/);

  # add a date condition (from config file or CGI params or whatever)
  my $date = get_initial_date_from_some_external_source();
  $statement->refine(-where => {d_begin => {">" => $date}});

  # now issue the SQL query
  my $rows = $statement->select(-columns => [qw/d_begin lastname firstname/]);

This code generates the following SQL :

  SELECT d_begin, lastname, firstname
  FROM   activity INNER JOIN employee
                  ON activity.emp_id=employee.emp_id
  WHERE  dpt_id  = $departement->{dpt_id}
    AND  d_begin > $date

Behind the scene, the join method first created a view representing the database join between activity and employee; then it created a statement object that would query that view with an initial condition on dpt_id. The refine call added a second condition on d_begin. Finally the select method specified which columns to retrieve.

Stepwise parameter binding through named placeholders

DBIx::DataModel::Statement objects have their own mechanism of placeholders and parameter binding. Of course this gets ultimately translated into usual placeholders at the DBI and database layers; but an additional layer was needed here in order to allow for stepwise building of SQL conditions as just demonstrated above.

Stepwise binding of values to placeholders requires named placeholders, as opposed to usual positional placeholders. Named placeholders are usually expressed with a question mark followed by a name, like in

  $statement->refine(-where => {col1 => '?foo',
                                col2 => '?bar',
                                col3 => '?foo'});

Values are bound to these named parameters (either before or after the refine step) through the "bind" method :

  $statement->bind(foo => 123, bar => 456);

If the question mark prefix ? is inconvenient, another placeholder prefix may be specified as an option to the schema creation method.

COLUMN HANDLERS

GENERAL DESIGN PRINCIPLES

Material in the previous sections presented the general architecture of DBIx::DataModel; this should be enough to easily follow the QUICKSTART chapter, or investigate more details in the REFERENCE chapter.

Now we will discuss the motivation for some design features of DBIx::DataModel, in order to explain not only how it works, but also why it was designed that way. This section can be safely skipped, unless you are interested in comparing various ORMs.

Help lower-level layers, do not hide them

DBIx::DataModel provides abstractions that help client applications to automate some common tasks; however, access to lower-level layers remains open, for cases where detailed operations are needed :

  • The generated classes contain methods that can return polymorphic results. By default, the return value is an object or a list of objects corresponding to data rows; however, these methods can also return a handle to the underlying DBI statement, or even just the generated SQL code. Hence, the client code can take control whenever any fine tuning is needed.

  • Data rows exploit the dual nature of Perl objects : on one hand they can be seen as objects, with methods to walk through the data and access related rows from other tables, but on the other hand they can also be seen as hashrefs, with usual Perl idioms for extracting keys, values or slices of data. This dual nature is important for passing data to external helper modules, such as XML generators, Perl dumps, javascript JSON, templates of the Template Toolkit, etc. Such modules need to walk on the data tree, exploring keys, values and subtrees; so they cannot work if data columns are implemented as object-oriented methods (because there is no simple way to ask for all available methods, and even if you get there, it is not possible to distinguish which of those methods encapsulate relevant data).

Let the database do the work

Use RDBMS tools to create the schema

Besides basic SQL data definition statements, RDBMS often come with their own helper tools for creating or modifying a database schema (interactive editors for tables, columns, datatypes, etc.). Therefore DBIx::DataModel provides no support in this area, and assumes that the database schema is pre-existent.

To talk to the database, the framework only needs to know a bare minimum about the schema, namely the table names, primary keys, and UML associations; but no details are required about column names or their datatypes.

Let the RDBMS check data integrity

Most RDBMS have facilities for checking or ensuring integrity rules : foreign key constraints, restricted ranges for values, cascaded deletes, etc. DBIx::DataModel can also do some validation tasks, by setting up column types with a validate handler; however, it is better advised to exploit data integrity checks within the RDBMS whenever possible.

Exploit database projections through variable-size objects

Often in ORMs, columns in the table are in 1-to-1 correspondence with attributes in the class; so any transfer between database and memory systematically includes all the columns, both for selects and for updates. Of course this has the advantage of simplicity for the programmer. However, it may be very inefficient if the client program only wants to read two columns from a very_big_table.

Furthermore, unexpected concurrency problems may occur : in a scenario such as

  client1                            client2
  =======                            =======
  my $obj = My::Table->fetch($key);  my $obj = My::Table->fetch($key);
  $obj->set(column1 => $val1);       $obj->set(column2 => $val2);
  $obj->update;                      $obj->update;

the final state of the row should theoretically be consistent for any concurrent execution of client1 and client2. However, if the ORM layer blindly updates all columns, instead of just the changed columns, then the final value of column1 or column2 is unpredictable.

To diminish the efficiency problem, some ORMs offer the possibility to partition columns into several column groups. The ORM layer then transparently fetches the appropriate groups in several steps, depending on which columns are requested from the client. However, this might be another source of inefficiency, if the client frequently needs one column from the first group and one from the second group.

With DBIx::DataModel, the client code has precise control over which columns to transfer, because these can be specified separately at each method call. Whenever efficiency is not an issue, one can be lazy and specify nothing, in which case the SELECT columns will default to "*". Actually, the schema does not know about column names, except for primary and foreign keys, and therefore would be unable to transparently decide which columns to retrieve. Consequently, objects from a given class may be of variable size :

  my $objs_A = My::Table->select(-columns => [qw/c1 c2/],
                                 -where   => {name => {-like => "A%"}};

  my $objs_B = My::Table->select(-columns => [qw/c3 c4 c5/],
                                 -where   => {name => {-like => "B%"}};

  my $objs_C = My::Table->select(# nothing specified : defaults to '*'
                                 -where   => {name => {-like => "C%"}};

Therefore the programmer has much more freedom and control, but of course also more responsability : in this example, attempts to access column c1 in members of @$objs_B would yield an error.

Exploit database products (joins) through multiple inheritance

ORMs often have difficulties to exploit database joins, because joins contain columns from several tables at once. If tables are mapped to classes, and rows are mapped to objects of those classes, then what should be the class of a joined row ? Three approaches can be taken

  • ignore database joins altogether : all joins are performed within the ORM, on the client side. This is of course the simplest way, but also the less efficient, because many database queries are needed in order to gather all the data.

  • ask a join from the database, then perform some reverse engineering to split each resulting row into several objects (partitioning the columns).

  • create on the fly a new subclass that inherits from all joined tables : data rows then simply become objects of that new subclass. This is the approach taken by DBIx::DataModel.

High-level declarative specifications

UML compositions for handling data trees

Compositions are specific kinds of associations, pictured in UML with a black diamond on the side of the composite class; in DBIx::DataModel, those are expressed by calling the schemas's Composition method instead of Association. As a result, the composite class will be able to perform cascaded insertions and deletions on data trees (for example from structured data received through an external XML or JSON file, and inserted into the database in a single method call).

The reverse is also true : the composite class is able to automatically call its own methods to gather data from associated classes and build a complete data tree in memory. This is declared through the AutoExpand method and is useful for passing structured data to external modules, like for example XML or JSON exports.

ColumnTypes

A DBIx::DataModel schema can declare some column types : these are collections of handlers (callback functions) for performing tasks such as data validation or transformation. Handlers are then attached to specific columns belonging to that column type.

The handler concept is generic and can be exploited by client programs according to the application domain. However, some handler names have a special meaning within the framework : for example, handlers named fromDB or toDB are automatically called when transfering data from or to the database. Take for example the "Percent" column type shown in the Synopsis :

  # 'percent' conversion between database (0.8) and user (80)
  $schema->ColumnType(Percent =>
     fromDB   => sub {$_[0] *= 100 if $_[0]},
     toDB     => sub {$_[0] /= 100 if $_[0]},
     validate => sub {$_[0] =~ /1?\d?\d/});

Note that this notion of "type" is independent from the actual datatypes defined within the database (integer, varchar, etc.). From the Perl side, these are all seen as scalar values. So a column type as defined here is just a way to specify some operations, programmed in Perl, that can be performed on the scalar values.

Views within the ORM

define_table() declarations usually map directly to database tables; but it is also possible to map to an SQL query, possibly with a predefined where clause :

  $schema->metadm->define_table(
    class       => 'View_example',
    db_name     => 'Foo INNER JOIN Bar ON Foo.fk=Bar.pk',
    where       => {col => $special_filter},
    primary_key => [qw/some_foo_col some_bar_col/],
  );

This is exactly the same idea as database views, except that they are implemented within the ORM, not within the database. Such views can join several tables, or can specify WHERE clauses to filter the data. ORM views are useful to implement application-specific or short-lived requests, that would not be worth registering persistently within the database model. They can also be useful if you have no administration rights in the database. Of course it is also possible to access database views, because the ORM sees them as ordinary tables.

Extended SQL::Abstract API

Every method involving a SELECT in the database (either when searching rows from a table or collection of tables, or when following associations from an existing row) accepts an number of optional parameters that closely correspond to SQL clauses. The programming interface reuses what is defined in the excellent SQL::Abstract module, with some extensions. Therefore it is possible for example to specify

  • which columns to retrieve

  • which restriction criteria to apply (WHERE clause)

  • how to order the results

  • whether or not to retrieve distinct rows

  • etc.

All these parameters are specified at the statement level, and therefore may vary between subsequent calls to the same class. This is in contrast with some other ORMs where the set of columns or the ordering criteria are specified at schema definition time. As already stated above, DBIx::DataModel gives more freedom to client programs, but also more responsability.

Efficient interaction with the DBI layer

Great care has been taken to interact with the database in the most efficient way, and to leave an open access to DBI fine-tuning options. In particular :

  • At the DBI level, the fastest way to get a large number of data rows from DBI is to retrieve each row into the same memory location, using the bind_columns method. DBIx::DataModel can exploit this feature through a fast statement :

      my $statement = My::Table->select(-columns  => ...,
                                        -where    => ...,
                                        -resultAs => 'fast_statement');
      while (my $row = $statement->next) {
        work_with($row);
      }

    This code creates a single memory location for storing a data row; at each call to the next method, memory values are updated from the database, and the same location is returned.

    While being very fast, this approach also has some limitations : for example it is not possible to put such rows into an array (because the array would merely contain multiple references to the last row). So fast statements are not activated by default; regular statements create a fresh hashref for each row.

  • The client code can have fine control on statement preparation and execution, which is useful for writing efficient loops. For example, instead of writing

      my $list = My::Table->select(...);
      foreach my $obj (@$list) {
        my $related_rows = $obj->join(qw/role1 role2/)->select;
        ...
      }

    we can prepare a statement before the loop, and then just execute that statement at each iteration :

      my $statement = My::Table->join(qw/role1 role2/)->prepare;
      my $list = My::Table->select(...);
      foreach my $obj (@$list) {
        my $related_rows = $statement->execute($obj)->all;
        ...
      }
  • DBI has a prepare_cached method, that works like prepare except that the statement handle returned is stored in a hash associated with the $dbh. This can be exploited from DBIx::DataModel by stating

      $schema->dbi_prepare_method('prepare_cached');

DEPENDENCIES

DBIx::DataModel depends on

DBI
SQL::Abstract::More

DESIGN FAQ

Here are answers to some design choices, in the form of a FAQ.

Global state in class variables ? Isn't that bad design ?

The philosophy of DBIx::DataModel is that a record is nothing more than a blessed hashref, where hash keys are column names and hash values are column values. So all information about schemas, relationships, connections, etc is indeed within a classData hashref associated with each class. This class data is a global resource, and like with other global resources in Perl (for example STDIN, %ENV, or special variables $/, $,, etc.), several clients can interact through the global state.

When used with care, interaction of several components through a global resource can be quite handy : for example STDIN does not need to be explicitly passed to every component, it is always available; furthermore, STDIN can be redirected at one specific place and all collaborating components will immediately change behaviour accordingly. However, this must be done with care, because there is also a risk of undesired "action at distance" --- maybe the other components wanted to continue reading from the real STDIN, not the redirected one !

To avoid undesired interactions through the global state, Perl offers the local construct, also known as dynamic scoping (see perlsub). Using that construct, a component can temporarily redirect STDIN, for the duration of one specific computation, and then automatically restore it to its previous state.

DBIx::DataModel uses a similar approach. The database handle is stored in the schema class, and can be changed dynamically, which will immediately affect all classes and objects related to that schema. However, state modifications in schemas can be limited to a specific scope through the localizeState method. Furthermore, the doTransaction method takes an optional $dbh argument to localize the transaction within a specific database handle. With these methods, sane management of the global state is quite easy, and since nested transactions are supported, it is perfectly possible to program transactions with cross-database operations (copying objects from one database to another, or simultaneously performing the same insert or delete in several databases).

Actually, localization is only needed for Schema subclasses, that indeed hold a mutable global state (database connection, current debug mode, current value of selectImplicitlyFor, etc.). For Table and View subclasses, all information in classData is set by "compile-time methods" (methods starting with an uppercase letter) and then stays immutable : joins, primary keys, etc.; so there is no potential conflict.

Why no accessor methods for columns ?

The philosophy of DBIx::DataModel is that a record is nothing more than a blessed hashref, where hash keys are column names and hash values are column values. So data access uses goes through the hashref API : this allows you to exploit all common Perl idioms, like

  # inspect hash keys
  my @column_names = keys @$row;      
  
  # remove leading spaces in all columns
  s/^\s+// foreach values @$row;      
  
  # print a slice
  print @{$row}{qw/col1 col2 col3/};  
  
  # swap values
  ($row->{col1}, $row->{col2}) = ($row->{col2}, $row->{col1}); 
  
  # other way to swap values
  @{$row}{qw/col1 col2/} = @{$row}{qw/col2 col1/};             

Serialization

DBIx::DataModel includes support for the standard Storable serialization / deserialization methods freeze and thaw : so records and record trees can be written into files or sent to other processes. Dynamic subclasses for database joins are re-created on the fly during deserialization through thaw. However, there is no support for serializing database connections (this would be hazardous, and also insecure because serialization data would contain database passwords). Therefore the process performing deserialization is responsible for opening the database connection by its own means, before calling the thaw method.

TO DO

Here are some points that hopefully will be improved in a future release DBIx::DataModel :

  - 'hasInvalidColumns' : should be called automatically before insert/update ?
  - 'validate' record handler (not only column handlers)
  - 'normalize' handler : for ex. transform empty string into null
  - walk through WHERE queries and apply 'toDB' handler (not obvious!)
  - more extensive and more organized testing
  - add PKEYS keyword in -columns, will be automatically replaced by
    names of primary key columns of the touched tables
  - design API for easy dynamic association of objects without dealing
    with the keys
  - remove spouse example from doc (because can't have same table
    twice in roles)
  - quoting
  - dbiPrepareMethod as argument to select()
  - pre/post callbacks: support arrays of handlers, refine(..) adds to the
    array
  - refine(-orderBy => ..) should add to the ordering
  - reflection methods (list of roles, etc.)
  - update with subtrees (insert/update on dependent records. Quid: delete?)
  - auto-unjoin (API for partioning columns into subobjects).
  - inheritance between tables: a) support for Postgres; b) support for DB views
  - support for DBI TraceLevel
  - execute() : check for unbound "dbidm placeholders"
  - support DISTINCT ON ...
  - support find_or_create, update_or_create
  - doc : Schema has only tables, pk, fk (no cols, triggers, stored procs, domains)
  - doc : SQL-oriented (like Fey)
  - copy idea from DBIC:Storage:DBI:MultiColumnIn

  - Storable
     - hooks for YAML ?
     - think about STORABLE_attach

  - doc : explain Table inheritance
     $schema->Table(..., {parents => ...});
     # ADD TESTS for table inheritance


  - declare RT bug for Params::Validate : doc is confused about
    validate_with / validation_options


  - savepoints

  - allow tables without primary key

  - various policies for guessing foreign keys : e.g proc.id = attr.proc_id

  - Statement::refine() should be more intelligent
      - combine (-where, -orderBy, callbacks)
      - reject overrides when not possible
      - should add something like a 'reset' method ??

  - explain how to design families of tables with an intermediate superclass

  - add attribute 'db_schema_name' in Schema->new(). Then one can have
       my $dev  = Schema->new(dbh => ..., db_schema_name => 'DEV');
       my $prod = Schema->new(dbh => ..., db_schema_name => 'PROD');
    and that name is prepended to table names

  - fix all doc refernces to ::View or ::Table (add Source::)

  - doc glossary, introduce the 'path' term

  - think API for defining column types : just in new() or allow
    to do it in later calls to a "setter" method ?

  - find a way to rename _singleInsert > _single_insert, without breaking
    compatibility


  Other column handlers :
    - validate
    - normalize

- various policies for guessing foreign keys : e.g proc.id = attr.proc_id

- Statement::refine() should be more intelligent - combine (-where, -orderBy, callbacks) - reject overrides when not possible - should add something like a 'reset' method ??

- declare RT bug for Params::Validate : doc is confused about validate_with / validation_options

- doc : explain Table inheritance $schema->Table(..., {parent_tables => ...}); # ADD TESTS for table inheritance - Storable - hooks for YAML ? - think about STORABLE_attach

 - think about reuse of statements : either this should be forbidden,
   or we need a way to do it cleanly, e.g

      my $statement = $schema->table($name);
      my $result1 = $statement->select(...);
      $statement->clear;
      my $result2 = $statement->select(...);

- think about freeze/thaw in multi-schema mode

  - BEWARE : sql generation occurs later, problem if \@columns array has changed

1 POD Error

The following errors were encountered while parsing the POD:

Around line 125:

Non-ASCII character seen before =encoding in 'façade'. Assuming CP1252