The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

DBIx::Admin::BackupRestore - Backup all tables in a database to XML, and restore them

Synopsis

        use DBIx::Admin::BackupRestore;

        # Backup.

        open(OUT, "> $file_name") || die("Can't open(> $file_name): $!");
        print OUT DBIx::Admin::BackupRestore -> new(dbh => $dbh) -> backup('db_name');
        close OUT;

        # Restore.

        DBIx::Admin::BackupRestore -> new(dbh => $dbh) -> restore($file_name);

Description

DBIx::Admin::BackupRestore is a pure Perl module.

It exports all data - except nulls - in all tables from one database to one or more XML files.

Actually, not all tables. Table names which match /^BIN\$.+\$./ are discarded. This is for Oracle.

Then these files can be imported into another database, possibly under a different database server, using methods restore() or restore_in_order().

Note: Importing into Oracle does not handle sequences at all.

In the output, all table names and column names containing spaces have those spaces converted to underscores. This is A Really Good Idea.

Also, the case of the output table and column names is governed by the database handle attribute FetchHashKeyName.

Warning: This module is designed on the assumption you have a stand-alone script which creates an appropriate set of empty tables on the destination database server. You run that script, and then run this module in 'restore' mode.

Such a stand-alone script is trivial, by getting the output of method get_column_names() and feeding it into the constructor of DBIx::Admin::CreateTrivialSchema. Of course, you would only use this feature as a crude way of dumping the data into a database for quick inspection before processing the XML properly.

This module is used daily to transfer a MySQL database under MS Windows to a Postgres database under GNU/Linux.

Similar modules are discussed below.

See also: http://savage.net.au/Ron/html/msaccess2rdbms.html

Distributions

This module is available both as a Unix-style distro (*.tgz) and an ActiveState-style distro (*.ppd). The latter is shipped in a *.zip file.

See http://savage.net.au/Perl-modules.html for details.

See http://savage.net.au/Perl-modules/html/installing-a-module.html for help on unpacking and installing each type of distro.

Constructor and initialization

new(...) returns an object of type DBIx::Admin::BackupRestore.

This is the class's contructor.

Usage: DBIx::Admin::BackupRestore -> new().

This method takes a set of parameters. Only the dbh parameter is mandatory.

For each parameter you wish to use, call new as new(param_1 => value_1, ...).

clean

The default value is 0.

If new is called as new(clean => 1), the backup phase deletes any characters outside the range 20 .. 7E (hex).

The restore phase ignores this parameter.

This parameter is optional.

croak_on_error

This parameter takes one of these values: 0 or 1.

The default value is 1, for backwards compatibility.

During backup(), the $sth -> execute() is now wrapped in eval{}, and if an error occurs, and croak_on_error is 1, we Carp::croak.

If croak_on_error is 0, we continue. Not only that, but if verbose is 1, the error is printed to STDERR.

This parameter is optional.

dbh

This is a database handle.

This parameter is mandatory when calling methods backup() and restore*(), but is not required when calling method split(), since the latter is just a file-to-file operation.

dbi_catalog, dbi_schema, dbi_table, dbi_type

These 4 parameters are passed to DBI's table_info() method, to get a list of table names.

The default values suit MySQL:

dbi_catalog = undef
dbi_schema = undef
dbi_table = '%'
dbi_type = 'TABLE'

For Oracle, use:

dbi_catalog = undef
dbi_schema = uc $user
dbi_table = '%'
dbi_type = 'TABLE'

That is, for Oracle you would call this module's constructor like so:

        $user = 'The user name used in the call to DBI -> connect(...)';
        new(dbh => $dbh, dbi_schema => uc $user);

For Postgres use:

dbi_catalog = undef
dbi_schema = 'public'
dbi_table = '%'
dbi_type = 'TABLE'

That is, for Postgres you would call this module's constructor like so:

        new(dbh => $dbh, dbi_schema => 'public');
fiddle_timestamp

This parameter takes one of these values: 0, 1 or 2, or any of those values + 128.

The 128 means the top (left-most) bit in the byte value of this parameter is set.

The default value is 1.

If the value of this parameter is 0, then restore() does not fiddle the value of fields whose names match /timestamp/.

If the value of the parameter is 1, then restore() fiddles the value of fields whose names match /timestamp/ in this manner:

        All values are assumed to be of the form /^YYYYMMDD/ (fake reg exps are nice!).
        Hours, minutes and seconds, if present, are ignored.
        Timestamps undergo either 1 or 2 transformations.
        Firstly, if the value matches /^0000/, convert it to 19700101.
        Then, all values are converted to YYYY-MM-DD 00:00:00.
        Eg: This - 00000000 - is converted to 1970-01-01 00:00:00
        and today - 20050415 - is converted to 2005-04-15 00:00:00.
        You would use this option when transferring data from MySQL's 'timestamp' type
        to Postgres' 'timestamp' type, and MySQL output values match /^(\d{8})/.

If the value of the parameter is 2, then restore() fiddles the value of fields whose names match /timestamp/ in this manner:

        Timestamps undergo either 0 or 1 transformations.
        If the value matches /^0000/, hours, minutes and seconds, if present, are ignored.
        If the value matches /^0000/, convert it to 1970-01-01 00:00:00.
        Values not matching that pattern are not converted.
        Eg: This - 0000-00-00 00:00:00 - is converted to 1970-01-01 00:00:00
        and today - 2005-04-15 09:34:00 - is not converted.
        You would use this option when transferring data from MySQL's 'datetime' type
        to Postgres' 'datetime' type, and some MySQL output values match /0000-00-00 00:00:00/
        and some values are real dates, such as 2005-04-15 09:34:00.

If the top bit is set, another fiddle takes place, after any of the above have occurred:

The timestamp is checked against 1970-01-01 00:00:00, and if they match, the timestamp is changed to 1970-01-01 00:00:01. This extra second means the timestamp is now valid under the strict option for MySQL V 5, whereas 1970-01-01 00:00:00 is invalid.

This parameter is optional.

odbc

This parameter takes one of these values: 0 or 1.

The default value is 0.

During backup, if odbc is 1 we use the simplified call $dbh -> tables() to get the list of table names. This list includes what MS Access calls Queries, which are possibly equivalent to views. MS Access does not support the syntax used in the non-ODBC situation: $dbh -> tables('%', '%', '%', 'table').

This parameter is optional.

rename_columns

This parameter takes a hash href.

You specify a hash ref in the form:

        rename_columns => {'old name' => 'new name', ...}.

For example, 'order' is a reserved word under MySQL, so you might use:

        rename_columns => {order => 'orders'}.

The option affects all tables.

The database handle attribute FetchHashKeyName affects this option. Renaming takes place after the effect of FetchHashKeyName.

This parameter is optional.

rename_tables

This parameter takes a hash href.

You specify a hash ref in the form:

        rename_tables => {'old name' => 'new name', ...}.

The database handle attribute FetchHashKeyName affects this option. Renaming takes place after the effect of FetchHashKeyName.

This parameter is optional.

skip_schema

The default value is [].

If new is called as new(skip_schema => ['some_schema_name']), the restore phase does not restore any tables in the named schema.

Here, 'schema' is defined to be the prefix on a table name, and to be separated from the table name by a '.'.

Note: You would normally use these options to port data from Postgres to MySQL: new(skip_schema => ['information_schema', 'pg_catalog'], transform_tablenames => 1).

skip_tables

The default value is [].

If new is called as new(skip_tables => ['some_table_name', ...]), the restore phase does not restore the tables named in the call to new().

This option is designed to work with CGI scripts using the module CGI::Sessions.

Now, the CGI script can run with the current CGI::Session data, and stale CGI::Session data is not restored from the XML file.

See examples/backup-db.pl for a list of MS Access tables names which you are unlikely to want to transfer to an RDBMS.

This parameter is optional.

transform_tablenames

The default value is 0.

The only other value currently recognized by this option is 1.

Now, new(transform_tablenames => 1) chops the schema, up to and including the first '.', off table names. Thus a table exported from Postgres as 'public.service' can be renamed 'service' when being imported into another database, eg MySQL.

Here, 'schema' is defined to be the prefix on a table name, and to be separated from the table name by a '.'.

Note: You would normally use these options to port data from Postgres to MySQL: new(skip_schema => ['information_schema', 'pg_catalog'], transform_tablenames => 1).

This parameter is optional.

verbose

The default value is 0.

If new is called as new(verbose => 1), the backup and restore phases both print the names of the tables to STDERR.

When beginning to use this module, you are strongly encouraged to use the verbose option as a progress monitor.

This parameter is optional.

Method: backup($database_name)

Returns a potentially-huge string of XML.

You would normally write this straight to disk.

The database name is passed in here to help decorate the XML.

As of version 1.06, the XML tags are in lower case.

Method restore() will read a file containing upper or lower case tags. Method restore_in_order() won't.

Method: get_column_names

This returns a hash ref, where the keys are table names, possibly transformed according to the database handle attribute FetchHashKeyName, and the values are array refs of column names, also converted according to FetchHashKeyName.

Note: All spaces in table names are converted to underscores.

Further, these column names are sorted, and all spaces in column names are converted to underscores.

This hashref is acceptable to the module DBIx::Admin::CreateTrivialSchema :-).

Method: restore($file_name)

Returns an array ref of imported table names. They are sorted by name.

Opens and reads the given file, presumably one output by a previous call to backup().

The data read in is used to populate database tables. Use method split() to output to disk files.

Method: restore_in_order($file_name, [array ref of table names])

Returns nothing.

Opens and reads the given file, presumably one output by a previous call to backup().

The data read in is used to populate database tables. Use method split() to output to disk files.

Restores the tables in the order given in the array ref parameter.

This allows you to define a column with a clause such as 'references foreign_table (foreign_column)', and to populate the foreign_table before the dependent table.

And no, mutually-dependent and self-referential tables are still not catered for.

And yes, it does read the file once per table. Luckily, XML::Records is fast.

But if this seems like too much overhead, see method split().

Method split($file_name)

Returns an array ref of imported table names. They are sorted by name.

Opens and reads the given file, presumably one output by a previous call to backup().

Each table not being skipped is output to a separate disk file, with headers and footers the same as output by method backup().

This means each file can be input to methods restore() and restore_in_order().

The tables' schema names and table names are used to construct the file names, together with an extension of '.xml'.

See examples/split-xml.pl and all-tables.xml for a demo.

Lastly, method split() uses lower-case XML tags.

Example code

See the examples/ directory in the distro.

There are 2 demo programs:

backup-db.pl
restore-db.pl

FAQ

Why do I get 'duplicate key' errors after restoring?

Most likely because:

You are using Postgres or equivalent
You created a sequence

Eg: create sequence t_seq.

You created a table with the primary key referring to the sequence

Eg: create table t (t_id integer primary key default nextval('t_seq'), ...).

You populated the table

Let's say with 10 records, so the sequence is now at 10.

And the primary key field now contains the values 1 .. 10.

You exported the table with this module

Note: The export file contains the values 1 .. 10 in the primary key field.

You recreated the sequence

So the sequence is now at 1.

You recreated the table
You imported the data with this module

Note: Since the import file contains the values 1 .. 10 in the primary key field, these values are used to populate the table, and the sequence's nextval() is never called.

So the sequence is still at 1.

You tried to insert a record, which triggered a call to nextval()

But this call returns 1 (or perhaps 2), which is already in the table.

Hence the error about 'duplicate key'.

Related Modules

On CPAN I can see 4 modules which obviously offer similar features - there may be others.

DBIx::Copy
DBIx::Dump
DBIx::Migrate
DBIx::XML_RDB

Of these, DBIx::XML_RDB is the only one I have experimented with. My thanks to Matt Sergeant for that module.

I have effectively extended his module to automatically handle all tables, and to handle importing too.

Author

DBIx::Admin::BackupRestore was written by Ron Savage <ron@savage.net.au> in 2004.

Home page: http://savage.net.au/index.html

Copyright

Australian copyright (c) 2004, Ron Savage. All Programs of mine are 'OSI Certified Open Source Software'; you can redistribute them and/or modify them under the terms of The Artistic License, a copy of which is available at: http://www.opensource.org/licenses/index.html