DBIx::DataStore is designed to abstract away the oft-repeated parts of DBI and to simplify the way you issue SQL statements to your database(s).


As a fairly contrived example, below is a chunk of code using this module in a relatively simple way. It starts by instantiating a new DBIx::DataStore object connected to the "commerce" data store. It then issues a single SELECT query, with the optional paginating feature turned on and set to the first page of results. It then gets a Data::Page object through the pager() method and loops over the first page's worth of results from the database to print them.

    use DBIx::DataStore ( config => 'yaml' );

    my $db = DBIx::DataStore->new('commerce');

    my $results = $db->do({ page => 1, per_page => 15 }, q{
        select, p.price, as category
        from products p
            join product_categories pc on (pc.produc_id =
            join categories c on ( = pc.category_id)
        where in ???
            and p.price between ? and ?
        order by p.price desc, asc
    }, [2,3,5], 17, 23);

    my $pager = $results->pager;

    while ($results->next) {
        print sprintf("%s was found in category %s for \$%.2f.\n",
            @{$results}{qw( name category price )});

    print sprintf("\nShowing %d to %d of %d total results.\n",
        $pager->first, $pager->last, $pager->total_entries);

And here is what the output from that code might look like.

    Golden Apple was found in category Food for $22.24.
    Mermaid Statue was found in category Artwork for $17.76.
    Norton's Epaulets was found in category Clothing for $17.76.

    Showing 1 to 3 of 3 total results.


This is the legacy release of DBIx::DataStore and has a low chance of seeing future (non-critical bug fix) releases. It is being published for the primary purpose of easing the maintenance of existing installations.

Future versions of this module will make attempts to maintain as much backwards compatibility as possible, but there are no guarantees that every feature or method will carry over unchanged from the user perspective. It is recommended that if you do build something around this module that you pin to pre-1.0 versions. A future release which breaks functionality with what is presented here will begin with a new major version.

This code has been in heavy production use at multiple companies for almost fifteen years and is considered pretty (though not perfectly) stable. You are welcome to make use of it, in the form presented here, in your own projects. Significant feature requests for this version will likely be met with a somewhat low priority, and development of new applications or libraries with it is not strongly encouraged.

Critical security and bug fix requests will be reviewed.


In using DBIx::DataStore, there are three main types of objects with which you'll generally interact.

  • Database objects

    These objects manage the connections to your database servers (either one or two servers at all times, depending on whether you have any reader databases configured) and are used to issue all commands to your database backend.

  • Result Set objects

    Every time you issue a read-oriented query through the do() method of a database object, a new result set object is created. These objects are what you use to access the results of your query. Unlike the normal method of accessing a row's data using DBI methods directly, with DBIx::DataStore result set objects, the current row's data is accessed through the result set object itself.

    Result set objects also contain a single result row object at any given time (or no result row object if you haven't yet called the next() method on a result set object). You don't actually directly interact with a distinct object for each row -- row methods are issued through the result set object to act on the currently visible row, and will simply fall through to the row object.

  • Pager objects

    These objects are used only when you request one by calling the pager() method on a result set object. They are normal Data::Page objects, so refer to the documentation for that module for further details. Please note that you can only use pager objects on a result set when you passed in at least one of page or per_page arguments to the do() method. Without either of those arguments, your query will be performed in unpaged mode and you will trigger an error if you attempt to call the pager() method on your result set. You will also get an error if you explicitly turned paging support off when loading DBIx::DataStore.


DBIx::DataStore allows a number of options to be specified at the time you import the module into your code. These options, and their effects, are described here. Arguments are passed as a hash (not a reference) to the module as part of the use statement. For example, to load DBIx::DataStore with the default options, except for debugging which we'll set to "5", do:

    use DBIx::DataStore ( debug => 5 );

Below is a description of each option that can be fiddled with on module import. The name in parentheses at the start of each subsection is the key name to use in the hash passed to DBIx::DataStore during use.

Debugging (debug)

Accepts any integer value. Non-integer values, or any numbers zero or lower will turn off debugging. Any positive integers will turn on debugging, with higher numbers producing more debugging output. Typically, a debugging level of 1 will only produce non-fatal-but-concerning debug messages, analogous to an INFO level. At level 2 messages will generally be produced that are more warning-only in nature, but not serious issues. Debugging level 3 introduces submodule and method entry messages into the output. Debugging level 4 and above are more or less reserved for "here" statements to trace detailed code execution.

Debugging level 5 is (or at least should) be the highest number that actually introduces any changes in the output. This level should not actually trigger any more calls to the internal logger, so in terms of the number of statements it should be functionally equivalent to debugging level 4. However, unlike lower levels of output, this will cause a full stack trace to be produced for every single call to the logger. As such, this debugging level is only recommended for tracking down really nasty bugs or for general use by the clinically insane.

Calls to the internal logger are handled by a foldable constant, so there should be no performance penalty at all when debugging is turned off -- the Perl compiler should remove those calls from the code entirely.

Configuration Loader (config)

DBIx::DataStore can use multiple configuration formats. Right now support only exists for YAML, but if you'd rather use INI files or on-disk Storable seralized data structures (and if a DBIx::DataStore::Config submodule has been written to support it) you're more than welcome to change that. This is done by passing in the config argument when loading DBIx::DataStore.

You can also indicate that none of the configuration loader submodules should be used by not passing in a config argument at all. If you do this, you will be expected to pass in an appropriate configuration data structure (details on that later in this document) to the constructor.

Note that if you do use a configuration loader, they read their actual configuration files and do the processing work immediately when DBIx::DataStore is imported, then cache the parsed configuration data. Thus, you shouldn't have to worry about the performance in web-based applications if you have a facility to pre-load this module (such as mod_perl in Apache) when you start the web server.

Home Directory Configurations (use_home)

This option goes in hand with the config option, and indicates to any relevant configuration loaders that they should also look inside the current user's home directory for configuration files. This is turned off by default because of the extra modules that are loaded (File::HomeDir and all of its dependencies) as well as the extra CPU time and stat calls necessary to do these checks.

Result Set Paginating (paging)

By default, Data::Page is automatically imported for use by the pager() method on result sets. In situations where you have no need for paging of your result sets and wish to avoid the extra time and memory spent on that code, you can explicitly disable it. Note that if you do so and then try to call the pager method on a result set, you will trigger a fatal error.

You can also set this option to "auto" which allows you to call pager() without dying, but won't load Data::Page and its dependencies until the first time you need it. This load-on-demand can be bad in some cases, though, even if it seems likes a good idea. In single-process code that may or may not ever need to page something, setting this to auto would make sense. In a situation like mod_perl in Apache, it is advised against.

With load on demand in mod_perl, you end up only loading it for a single Apache process when it's first needed. If more than one process needs it, more than one copy is loaded. If those processes are eventually killed (through max keepalive request like settings) and its needed again, then it has to be loaded all over again. Instead, preloading it in the main Apache process creates a single copy available to every child Apache process for the lifetime of that Apache run.


General methods

The following methods are your primary interface to database objects. Typically you will only be calling the new() method once your applications, but unless you have very simple database needs you will almost certainly be making many calls to the do() method.


The constructor method actually supports multiple distinct syntaxes. The first is the old syntax from the SQL::Wrapper module (the immediate predecessor to DBIx::DataStore and never widely released). This syntax is deprecated and will some day be removed, so it is not discussed here (look at the code if you really must know what it is).

There are three main forms of the currently-supported constructor syntax. The first of these is to simply specify the name of the data store to which you want to connect and optionally and alternate schema list:

    my $db = DBIx::DataStore->new($datastore);
    my $db = DBIx::DataStore->new($datastore, @schemas);

This should be a single scalar value containing a string that matches the name of one of the datastores defined in your configuration (whether it be YAML or any of the other configuration loaders supported).

The second form allows more control over specific parts of a datastore's configuration and connection parameters:

    my $db = DBIx::DataStore->new({ store => $datastore, ... });

This version allows for overriding not just the schemas, but which reader should be used, changing the default settings for statement preparation, statement caching and so on.

TODO: Go into more detail on how exactly to set these extra options.

The last is the simplest, to pass in no arguments at all to the constructor. One of three things will happen. First, DBIx::DataStore will get a list of all the package names from the caller's stack, and starting with the bottom, working its way up to the very top of the stack, will look for any datastore which matches one of those package names with the regular expression in its "packages" variable. The first match to succeed will cause that datastore to be used for the connection.

If no matches were found, then a datastore is looked for which has the "is_default" flag set to a true value. If there is one, then that datastore will be used. If that check fails, then an error is produced indicating that there was no suitable choice for a default datastore connection.

do(\%options, $query, @binds)

This method requires at least one argument ($query) but can in some cases be called with far more than that. The first argument is optional and is used primarily to enable automated pagination of query results. If passed in, it should contain at least one of "page" (which will default to 1 if not specified) or "per_page" (which defaults to 25). If neither is passed in, automatic pagination will not be available for the returned result set.

If you do take advantage of the paginating options to this method, do not include any sort of row limit clause in your actual query (such as "LIMIT x OFFSET y", "ROWS x TO y" or whatever the flavor is for your particular database server). This method will add that clause to your query as appropriate.

The next argument (which would be the first one if you choose to leave pagination disabled) is required and must contain the SQL statement you would like issued to your database server. Regular scalar placeholders are allowed, as well as a non-DBI placeholder that can be used with arrays and hashes (in specific contexts). Refer to the "PLACEHOLDERS" section of this document for details.

All arguments after the query will be used as your bind variables when executing the SQL statement. They must be in the same order as the actual placeholders within your query and you must provide the exact same number of bind variables as you did placeholders. Failure to do so will result in a fatal error.

This method, when successful, returns a DBIx::DataStore result set object (see "RESULT SET METHODS" for details on what you can do with these).

The following methods are used when dealing with transactions. If your database server does not support transactions, these will be useless to you. You'll probably also receive fatal errors if you try to use them.


This method starts a new transaction block in your current database session. Please note that not all databases support nested transactions (and even those that do may have limits on how deeply they can be nested). You will receive an error if you attempt to call this method multiple times (with no intervening commit() or rollback() calls) when using a database that does not support nested transactions. You may want to look into whether savepoints will suffice for your needs in those cases.

As soon as you open a transaction, all queries issued to your database through the do() method will be sent to your primary server. No queries will be issued to your reader database until you either commit or rollback the transaction.


This method ends the current transaction block in your database session. If you are using a database server which supports nested transactions, you may need to call this method as many times as you called begin() (or you may not -- at least some versions of Oracle, for instance, default to commiting all your open transactions unless you set a server/session variable).


This method takes one optional argument. Called with no arguments, it rolls back all of the changes you've made to your database within the current transaction block. If you are using a database server that supports savepoints, you may also pass in a single argument with the name of an already defined savepoint (it must have been defined within the current transaction block, and not have already been cleared out by another rollback) to undo all of the changes made after the savepoint was created. Passing in an invalid savepoint name will generate a fatal error.


Calling this method on a database server which supports savepoints will create a new savepoint at the current point of your open transaction with the name you provide. This method can only be called when you have an open transaction block. Attempts to call it outside of a transaction will trigger a fatal error. It is also your responsibility to make sure you use a unique name for each of your savepoints if you require more than one.


This method returns true if your database object is currently in an open transaction.

Convenience methods


This method returns a list of the tables, excluding any views.


This method returns a list of the database (schema) names available on the primary database server.


This method returns a hash reference containing the following information about the primary database server: name (always "primary"), driver, host and database.


This method returns a hash reference containing the following information about the currently selected reader database: name, driver, host and database.


This method is a pass-through for DBI's last_insert_id() function. All the same caveats apply to this method. But just in case you aren't familiar with them, basically consider this method unreliable on many database servers. It should only be used with care, and only if you know your underlying RDBMS's DBD driver will do The Right Thing.


This method, when called, attempts to issue a very simple SQL statement (generally "select 1") against both the primary and reader database servers (or primary only if no reader has been selected). A true value will be returned if the statements were successful.


If called with no arguments, returns a list of the schemas currently in the search path for the primary server connection. If called with a list of scalar arguments, sets the connection's search path to those schemas (in the order they were provided).


Returns a list of hash references, detailing the database servers defined in the YAML config file. All servers defined are returned, regardless of whether they have, or will, respond to connections.

Within each hash reference the following key/value pairs are provided: name (as defined in the configuration file), driver, host and database. The first hash reference in the returned list will always be the server defined as the primary, followed by the readers sorted by their names.


This method returns a list of the table names that are present within the currently selected database (schema) on the primary server. The list returned will also include views (use base_tables() if you don't want the views).


This method will return a list of the views defined within your current schema.


Every call to the do() method on a database object which contains a read-oriented SQL query returns a result set object. These objects can then be used to access the data contained within the database query's results.

Hash and Array accessors

Each time you retrieve a record (aka "result row") from a query's result set, the values for each column in that record can be transparently accessed through hash keys (where the keys are the column names as defined by the original query) or array indices (in the order the columns were defined by the query). Both methods of accessing the record's values are available at all times (unlike the standard DBI methods where you have to choose up front between using fetchrow_array[ref]() or fetchrow_hashref()). Thus, something like the following is perfectly acceptable:

    my $result = $db->do(q{
        select id, name from users order by name asc
    while ($result->next) {
        print sprintf("ID %d: %s\n",

Retrieves the next row of results from the result set. The row's data is then directly accessible through the result set object itself (see "Hash and Array accessors"). This method also returns a reference to the result set object, making the following two snippets of code effectively identical (though the second is unnecessarily verbose):

    while ($result->next) {
        print $result->{'some_col_name'};


    while (my $row = $result->next) {
        print $row->{'some_col_name'};

The return value will be undef when there are no more rows to retrieve from the database.


Similar to a next() call, in that it moves to the next row in the result set (or returns an undefined value when all rows have been read already). However, this method returns a stand-alone hash reference containing as keys the column names from the query, and as values the contents of the current row of the result set.


Returns the exact same data structure as next_hashref(), except that it does not move to the next row in the result set first. You get a hash representation of the current row from the results, not the next row.


This method retrieves all rows from the database at once and returns a list of result set row objects, each one containing a single row from the result set. It is functionally equivalent to the following:

    my (@rows);
    while (my $row = $result->next) {
        push(@rows, $row);

Please keep in mind that, internally, that is effectively what this method does itself (though slightly more efficiently). So all() won't actually return everything if you've already called next() one or more times on your result set. You will only get the rows you have not yet retrieved via other means -- which also means that only the first call to this method will actually do anything.

It is recommended you don't use this method if you will actually be operating on each row's data in your Perl code. You're better off using next() and doing whatever processing you need to on each row. However, this method is useful if you're passing the results of your SQL query directly into another module like Template Toolkit and don't actually need to do any processing on the data before handing it off.


This method returns a list of the column names from your SQL query, in the same order they were returned by the database.


This method will return the number of rows that were either returned by the database (in the case of read-oriented queries) or the number of the rows that were affected by your query (in the case of updates, inserts, etc.).

If you used pagination in your call to the do() method, the number returned by this method is the number of rows your select query would have returned without pagination. This makes it very simple to do simple pagination of your query results, but still be able to display to the end user of your application how many total results there are.

There is a catch to this feature, though. Your database server must support subqueries against derived tables for this method to succeed. If you aren't sure whether your database supports this feature (most of them which support subqueries do), you can try to run the following SQL query (change "some_table" to a table name that actually exists first):

    select count(*) from (select * from some_table) derived

Some database servers, such as Oracle, don't allow you to give a name to a derived table in a SQL query like the one above. But if you're running Oracle, this method is properly supported anyway (for what it's worth, the only change to that query above to have it work on Oracle is to omit the word "derived" at the end).


This method will return a Data::Page object which you can use for easier paginating of your database query results. You cannot call this method on a result set object which was created from a call to do() that lacked the optional pagination options. Attempting to do so will generate a fatal error.


In addition to the standard scalar-value placeholder available through standard DBI calls, this module adds another type of placeholder which can be used for inserting entire arrays or hashes into your queries. There are limitations on how and where they can be used, though. Both types of placeholders are written as a series of three question marks.

These additional forms for placeholders are optional in your queries. You are not forced to always use array placeholders for your IN clauses, nor are you forced to use the hash placeholders for your UPDATEs and INSERTs. You are more than welcome to use the regular style placeholders if you really prefer them (or have some other reason to not use the more convenient forms added by this module). You can also mix and match the styles within any single query as well, having one IN specified with single-value placeholders and another IN with an array placeholder in a single query.

  • Array placeholders

    These allow you to pass in a list of values to be used in an IN (...) clause, without you having to know or care how many elements are in the array. They cannot be used anywhere else but as part of an IN (although placing the parentheses around the placeholder is optional -- it will be added if you didn't include it).


        my $result = $db->do(q{
            select * from users where id in (???)
        }, \@userid_list);

    If your list of user IDs contained the values "1", "2" and "3" this would have achieved the same exact effect as you writing out the do() call more verbosely as:

        my $result = $db->do(q{
            select * from users where id in (?,?,?)
        }, 1, 2, 3);

    But then, you would have needed to know exactly how many elements were going to be in the list of IDs you wanted to match against and would have had to write that many single placeholders in the query yourself. Because the values in your list remain as real placeholders under the hood of this module, you can still take advantage of statement caching and the like from the underlying DBI methods (assuming you have that turned on), and the values you pass into the query will be safely escaped as usual.

  • Hash placeholders for UPDATE

    These can be used to fill in the SET portion of your UPDATE statement. Each key/value pair will be turned into "column = ?" with the values of your hash remaining as placeholders so the same advantages of array placeholders apply here as well.


        my $num_rows_updated = $db->do(q{
            update users set ??? where id = ?
        }, { name => $new_name }, $user_id);

    Writing out the normal placeholder(s) yourself would work too, but would get pretty annoying if you're updating many columns at once.

  • Hash placeholders for INSERT

    The second place in which hash placeholders can be used is for INSERT statements. Multiple record inserts are also supported, just put all the hash references containing each record's data into an array reference.

    There is one slight gotcha with using hash placeholders in INSERT statements. You cannot specify the column name list inside your SQL query. The do() method will fill that in for you when it processes the placeholder. The values keyword in your query is optional, but if present, the placeholder must come after it, not before.


        my $num_rows_inserted = $db->do(q{
            insert into users values ???
        }, { name => $name, email => $email, ... });


Exceptions to selection of secondary servers for read-only statements

If indicated by the "reader" argument to new(), a secondary server may be used for distributing the load of read-only statements. However, no statements within a transaction will be issued to any server other than the primary, regardless of the "reader" setting. Also, if only a single server (the primary) is defined in the config, but "__random" is indicated in new()'s arguments, it will have no effect (all statements will be issued through the primary) and no errors will be reported. Thus, it is safe to use "__random" even if you have no secondary databases (and can save you time updating your code later if you add more database servers down the road).


Database server configuration may current be done through either a YAML file or by passing in an equivalent datastructure to your new() call. Other file formats would be possible with an appropriate config loader, but YAML is the only one currently supported.

You can override where the configuration file is located when calling the new() method, or you can pass in a scalar containing raw, unprocessed YAML, or even pass in a hash reference which contains a data structure identical to what YAML would have returned itself if you need to bypass the YAML parsing for any reason. The latter is particularly useful if your application already has its own configuration files and you wish to embed the DBIx::DataStore config data within them.

Your configuration must contain at least one "primary" server definition, and may contain any number of "reader" server definitions (or none at all if you only have a single server). Each server definition in the config must contain the following: DBD driver name, host address, database name, username and password. You may optionally include a list of key/value pairs for each server under the heading "dbd_opts" which will be passed directly through to any connection setup to that server. For details on what settings are available, check the documentation for DBI and for any relevant DBD::* modules you will be using.

For some database servers, depending on your configuration, you may also need to specify a list of schemas for your connection's search path. This is currently only handled for PostgreSQL connections, and is only necessary if you will be accessing tables, functions, etc. that exist outside the default "public" schema. If unspecified, you will only be able to access objects in the default schema, unless you provide fully qualified identifiers (and assuming you have appropriate permissions to do so). If specified, you must list all schemas for which you want in your search path, including the "public" schema. Any number of schemas may be listed, and they will be added to your search path in the same order you specify in the configuration. For all non-PostgreSQL servers, the schemas option will be ignored if specified.

In addition to the definition of individual servers, there are a few top-level configuration settings.

  • default_reader

    Defines which reader database to use. Valid values are: the name of one of your defined reader databases, "__random" (which will, as the name implies, choose a random reader database from the list), "primary" and "none". The last two have the same effect as not defining a default reader at all. Without a specific reader named, or "__random" to choose one randomly, no reader database will be used and all queries will be issued to the primary server.

  • reader_failover

    Valid values are one or zero (defaults to zero). With this option turned on, every query issued to a reader database will be preceded by a ping() call. If the ping fails, then a new reader database will be selected (for the current and all future queries issued to the reader). A warning will be printed each time a new reader database needs to be selected due to a failed ping, but no error will be issued unless a new reader cannot be selected.

    Please note that each time a new reader database needs to be selected, all readers will be considered, even if they had failed before. This is done in the event that a previously unresponsive reader becomes available again. You can turn this off (and only consider readers that have not failed before) by turning on the flag_bad_readers option.

  • flag_bad_readers

    If both this option and reader_failover are turned on, then a reader database will be taken out of consideration for all future reconnection attempts if it has failed a ping attempt at any point within the current process.

  • cache_connections

    With this option turned on, new database connections will be created through DBI's connect_cached() method instead of the normal connect() method. This allows for basic connection pooling. For the full details, check DBI's documentation. Basically what happens is if you make multiple calls to connect_cached() with the exact same arguments (including the extra connection parameters like RaiseError, AutoCommit, etc. -- not just the DSN, username and password) you will get back the same database connection handle each time, instead of brand new and untainted handles. The exception is if an existing, and cached, database handle still has an active statement handle on it, it will not be returned. Instead it will be removed from the cache and a fully-new connection to the database will be established, cached and returned.

  • cache_statements

    When this option is turned on, statement preparation in DBI will use prepare_cached() instead of prepare(). For some databases this can provide measurable performance improvements if you issue the same query (this includes the placeholders, but not the values being used within those placeholders) repeatedly. Not all databases' DBD modules show much or any difference in performance between prepare() and prepare_cached(), but preparation caching is generally very safe.

YAML Configuration Example

    default_reader: __random
    reader_failover: 1
    flag_bad_readers: 0
    cache_connections: 0
    cache_statements: 1
        driver: Pg
        db: mydatabase
        host: db-1
        user: username
        pass: password
            - myschema
            - public
            AutoCommit: 0
            driver: Pg
            db: mydatabase
            host: db-2
            user: username
                - myschema
                - public
            driver: Pg
            db: mydatabase
            port: 8306
            user: username
                - myschema
                - public

Explicit Hashref Configuration Example

    my $config = {
        default_reader      => '__random',
        reader_failover     => 1,
        flag_bad_readers    => 0,
        cache_connections   => 0,
        cache_statements    => 1,
        primary => {
            driver  => 'Pg',
            db      => 'mydatabase',
            host    => 'db-1',
            user    => 'username',
            pass    => 'password',
            schemas => ['myschema','public'],
            dbd_opts => {
                AutoCommit => 0,
        readers => {
            reader1 => {
                driver  => 'Pg',
                db      => 'mydatabase',
                host    => 'db-2',
                user    => 'username',
                schemas => ['myschema','public']
            reader2 => {
                driver  => 'Pg',
                db      => 'mydatabase',
                host    => '',
                port    => 8306,
                user    => 'username',
                schemas => ['myschema','public']
    my $db = DBIx::DataStore->new({ config => $config });

Configuring Database Passwords

Because DBIx::DataStore uses the normal DBI/DBD layers underneath, all the usual methods of locating and presenting database credentials to the appropriate database server are available. This includes methods such as the .pgpass file for PostgreSQL and equivalents for other RDBMSes. If your DBIx::DataStore configuration does not include a pass attribute for a given database host, these alternate methods will be used as long as they are properly configured.


Data::Page, DBI, YAML::Syck


Jon Sime <>, Buddy Burden <>


This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.