The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

DBIx::Compare - Compare database content

SYNOPSIS

        use DBIx::Compare;

        my $oDB_Comparison = db_comparison->new($dbh1,$dbh2);
        $oDB_Comparison->verbose;
        $oDB_Comparison->compare;
        $oDB_Comparison->deep_compare;
        $oDB_Comparison->deep_compare(@aTable_Names);   

DESCRIPTION

DBIx::Compare takes two database handles and performs comparisons of their table content.

DATABASE CONNECT DESCRIPTORS

The database name is required for some operations. Unfortunately the variety of possible syntax for a connection description makes extraction of this difficult. This problem is worked around by placing some restrictions on the syntax:

These connection descriptions are allowed:

        "...[database|dbname]=mydb;..." or
        "mydb:..." or
        "mydb;..." 

i.e using the "database" or "dbname" keyword or else specifying the database name as the first field.

Other variants will result in the error: DBIx::Compare ERROR; Cannot extract database name from connection string: ...";

COMPARISON METHODS

deep_compare, deep_compare(@aTables)

When called without any arguments, this method performs a row-by-row comparison on any table that passes the rapid comparison test (see "compare"). Returns true if the tables are identical, false/undef if a difference was found. In verbose mode, reports differences found as per "compare", together with the table name and row number of any differences found by the row-by-row comparison.

When passed a list of table names, deep_compare is forced to perform the row-by-row comparison of each table, instead of only analysing those tables that pass the rapid comparison test. This can be useful to track down where the differences actually are.

All differences can also be returned using the "get_differences" method.

Rapid (low-level) comparisons

compare

Performs a low level comparison. Calls the methods compare_table_lists, compare_table_fields, compare_row_counts and (if available) compare_table_stats. Returns true if no differences are found, otherwise returns undef.

compare_table_lists

Simple comparison of the table names. Returns true if no differences are found, otherwise returns undef. An array ref of tables unique to each database:host can be recovered with get_differences(), using the hash key 'Tables unique to [db name:host]'

compare_table_fields

Simple comparison of each table's field names. Returns true if no differences are found, otherwise returns undef. An array ref of fields unique to each database:host can be recovered with get_differences(), using the hash key 'Fields unique to [db name:host.table]'

compare_row_counts

Comparison of the row counts from each table. Can pass a table name, or will compare all tables. Returns true if no differences are found, otherwise returns undef. An array ref of tables with different row counts can be recovered with get_differences(), using the hash key 'Row count'.

compare_table_stats

Aggregate (mathematical) comparisons of each table field. For numeric fields, compares the average, minimum, maximum and standard deviation of all values. For string fields, performs these comparisons on the length (in bytes) of each string. For date/time fields, performs these comparisons on the numeric value of the date/time (when possible). Clearly, the value of these comparisons will vary hugely - but where there is enough variety in the table content, this can be informative of any differences.

Returns true if no differences are found, or if the function is not supported for a particular database driver, or if there is no DBIx::Compare:: plug-in for that driver. Otherwise returns undef. An array ref of tables with different row counts can be recovered with get_differences(), using the hash key 'Bad fields in table [db name:host.table]'.

The SQL statements behind this method are provided by plug-in modules to DBIx::Compare, since the relevant SQL functions vary depending on the dialect. If a plug-in for your DBMS is not found, its easy enough to create one.

OTHER METHODS

new($dbh1,$dbh2)

You must pass two database handles at initialisation, and each database must be the same type.

verbose

Generates verbose output. Default is not verbose.

get_primary_keys($table,$dbh)

Returns the primary keys (in key order) for the given table/database, either as a list or as a comma separated string.

get_differences

Returns a hashref of differences between the two databases, where keys are the source of the difference, and values are an array ref of the differences found (see comparison methods above for details).

get_tables

Returns a table list. Returns a 2D list of tables in list context, or just a list of tables in database1 in scalar context;

        my @aList = $oDB_Comparison->get_tables;        # returns (['table1','table2',etc],['table1','table2',etc])
        my $aList = $oDB_Comparison->get_tables;        # returns ['table1','table2',etc]
common_tables

Returns a list of tables common to both databases. Recursively cals compare_table_lists() if not already called.

similar_tables

Returns a list of tables common to both databases and with identical row counts. Recursively cals compare_table_lists() and compare_row_counts() if not already called.

field_list($table)

Returns a list of fields for the particular table that are common in both databases.

AUTHOR

Christopher Jones, Gynaecological Cancer Research Laboratories, UCL EGA Institute for Women's Health, University College London.

c.jones@ucl.ac.uk

With some enhancements and bug fixes from;

Mark Kirkwood, Catalyst IT Limited, New Zealand.

mark.kirkwood@gmail.com

COPYRIGHT AND LICENSE

Copyright 2008 by Christopher Jones, University College London

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.