DataWarehouse::Dimension - a data warehouse dimension meta-information
Version 0.01
use DataWarehouse::Dimension; # NOTE: "table" and "attributes" were not implemented yet my $dimension = DataWarehouse::Dimension->new( dbh => $dbh, name => 'product', table => 'product', attributes => [ { name => 'id', PRIMARY_KEY => 1, }, { name => 'product_id', NATURAL_KEY => 1, }, { name => 'name', KEEP_HISTORY => 1, AGGREGATION_POINT => 1, }, { name => 'brand' KEEP_HISTORY => 1, AGGREGATION_POINT => 1, }, { name => 'price' KEEP_HISTORY => 1, }, ], );
Every dimensional model is made of Facts and Dimensions.
Facts are the measurements; for example, sales.
But the facts, per se, provide incomplete information: what is that amount related to? Dimensions are give context to the facts. For instance: "sales by country" or "sales by product", or "sales by product, by day"
In relational databases, facts and dimensions will be stored in separate tables; the result is the star schema:
+-----+ +-------------+ ! day !----- -----! salesperson ! +-----+ \ +-------+ / +-------------+ ! sales ! +---------+ / +-------+ \ +----------+ ! product !---- -----! customer ! +---------+ +----------+
Every dimension table must have a primary key that is different from the primary key used in the source systems.
The primary key should be meaningless; we call it a "surrogate key".
You should also store the "natural key", used to identify the records in the source systems: for example, product_id, customer_id, or salesperson_id. You can identify the natural keys with NATURAL_KEY => 1.
NATURAL_KEY => 1
One of the major goals of the data warehouse is to preserve history. For instance: a product could be priced at $259 in one year, and have a price drop to $189 in the next year.
You don't want to overwrite the price in the product dimension, because that would cause you to loose information -- the old price was correct in the past.
The strategy to deal with this, is to create a new record with the updated information. That's why the data warehouse must use a surrogate key and should identify the natural keys.
You can identify the attributes for which you want to preserve history with (KEEP_HISTORY => 1).
KEEP_HISTORY => 1
If you don't identify an attribute with <KEEP_HISTORY>, we'll assume that you don't want to preserve history for that information.
<KEEP_HISTORY
Finally, if you think that one attribute will change too often, you should consider storing it in a separate dimension.
Some attributes are typically used as aggregation points, in many queries.
For instance: "month" and "year" are typical aggregate points in the "day" dimension; "brand" and "type" are typical aggregation points in the "product" dimension.
When you create a dimension, you can indicate that an attribute is an aggregation point; this information may be used to generate aggregate tables.
DataWarehouse::Fact
Nelson Ferraz, <nferraz at gmail.com>
<nferraz at gmail.com>
Please report any bugs or feature requests to bug-dw at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=DataWarehouse. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
bug-dw at rt.cpan.org
You can find documentation for this module with the perldoc command.
perldoc DataWarehouse::Dimension
You can also look for information at:
RT: CPAN's request tracker
http://rt.cpan.org/NoAuth/Bugs.html?Dist=DataWarehouse
AnnoCPAN: Annotated CPAN documentation
http://annocpan.org/dist/DataWarehouse
CPAN Ratings
http://cpanratings.perl.org/d/DataWarehouse
Search CPAN
http://search.cpan.org/dist/DataWarehouse/
Copyright 2010 Nelson Ferraz.
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See http://dev.perl.org/licenses/ for more information.
To install DataWarehouse::ETL, copy and paste the appropriate command in to your terminal.
cpanm
cpanm DataWarehouse::ETL
CPAN shell
perl -MCPAN -e shell install DataWarehouse::ETL
For more information on module installation, please visit the detailed CPAN module installation guide.