The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

DataWarehouse::Dimension - a data warehouse dimension meta-information

VERSION

Version 0.01

SYNOPSIS

    use DataWarehouse::Dimension;

    # NOTE: "table" and "attributes" were not implemented yet

    my $dimension = DataWarehouse::Dimension->new(
        dbh   => $dbh,
        name  => 'product',
        table => 'product',
        attributes => [
            {
                name => 'id',
                PRIMARY_KEY => 1,
            },
            {
                name => 'product_id',
                NATURAL_KEY => 1,
            },
            {
                name => 'name',
                KEEP_HISTORY => 1,
                AGGREGATION_POINT => 1,
            },
            {
                name => 'brand'
                KEEP_HISTORY => 1,
                AGGREGATION_POINT => 1,
            },
            {
                name => 'price'
                KEEP_HISTORY => 1,
            },
        ],
    );

DESCRIPTION

Every dimensional model is made of Facts and Dimensions.

Facts are the measurements; for example, sales.

But the facts, per se, provide incomplete information: what is that amount related to? Dimensions are give context to the facts. For instance: "sales by country" or "sales by product", or "sales by product, by day"

In relational databases, facts and dimensions will be stored in separate tables; the result is the star schema:

         +-----+                       +-------------+
         ! day !-----             -----! salesperson !
         +-----+     \ +-------+ /     +-------------+
                       ! sales !
      +---------+    / +-------+ \     +----------+
      ! product !----             -----! customer !
      +---------+                      +----------+

PRIMARY KEY

Every dimension table must have a primary key that is different from the primary key used in the source systems.

The primary key should be meaningless; we call it a "surrogate key".

NATURAL KEYS

You should also store the "natural key", used to identify the records in the source systems: for example, product_id, customer_id, or salesperson_id. You can identify the natural keys with NATURAL_KEY => 1.

SLOWLY CHANGING DIMENSIONS

One of the major goals of the data warehouse is to preserve history. For instance: a product could be priced at $259 in one year, and have a price drop to $189 in the next year.

You don't want to overwrite the price in the product dimension, because that would cause you to loose information -- the old price was correct in the past.

The strategy to deal with this, is to create a new record with the updated information. That's why the data warehouse must use a surrogate key and should identify the natural keys.

You can identify the attributes for which you want to preserve history with (KEEP_HISTORY => 1).

If you don't identify an attribute with <KEEP_HISTORY>, we'll assume that you don't want to preserve history for that information.

Finally, if you think that one attribute will change too often, you should consider storing it in a separate dimension.

AGGREGATION POINTS

Some attributes are typically used as aggregation points, in many queries.

For instance: "month" and "year" are typical aggregate points in the "day" dimension; "brand" and "type" are typical aggregation points in the "product" dimension.

When you create a dimension, you can indicate that an attribute is an aggregation point; this information may be used to generate aggregate tables.

SEE ALSO

AUTHOR

Nelson Ferraz, <nferraz at gmail.com>

BUGS

Please report any bugs or feature requests to bug-dw at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=DataWarehouse. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc DataWarehouse::Dimension

You can also look for information at:

ACKNOWLEDGEMENTS

LICENSE AND COPYRIGHT

Copyright 2010 Nelson Ferraz.

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.