ETL::Pipeline::Output - Role for ETL::Pipeline output destinations
use Moose; with 'ETL::Pipeline::Output'; sub open { # Add code to open the output destination ... } sub write { # Add code to save your data here ... } sub close { # Add code to close the destination ... }
An output destination fulfills the load part of ETL. This is where the data ends up. These are the outputs of the process.
A destination can be anything - database, file, or anything. Destinations are customized to your environment. And you will probably only have a few.
ETL::Pipeline interacts with the output destination is 3 stages...
This role sets the requirements for these 3 methods. It should be consumed by all output destination classes. ETL::Pipeline relies on the destination having this role.
ETL::Pipeline provides a couple generic output destinations as exmaples or for very simple uses. The real value of ETL::Pipeline comes from adding your own, business specific, destinations...
ETL::Pipeline::Output
use Moose;
with 'ETL::Pipeline::Output';
The new destination is ready to use, like this...
$etl->output( 'YourNewDestination' );
You can leave off the leading ETL::Pipeline::Output::.
When ETL::Pipeline calls "open" or "close", it passes the ETL::Pipeline object as the only parameter. When ETL::Pipeline calls "write", it passed two parameters - the ETL::Pipeline object and the record. The record is a Perl hash.
ETL::Pipeline comes with a couple of generic output destinations...
Stores records in a Perl hash. Useful for loading support files and tying them together later.
Executes a subroutine against the record. Useful for debugging data issues.
My work involves a small number of destinations that rarely change and a greater number of sources that do change. So I designed ETL::Pipeline to minimize time writing new input sources. The trade off was slightly more complex output destinations.
ETL::Pipeline version 3 is not compatible with output destinations from older versions. You will need to rewrite your custom output destinations.
configure
finish
write_record
set
new_record
Shut down the ouput destination. This method may close files, disconnect from the database, or anything else required to cleanly terminate the output.
close receives one parameter - the ETL::Pipeline object.
The output destination is closed after the input source, at the end of the ETL process.
Prepare the output destination for use. It can open files, make database connections, or anything else required to access the destination.
open receives one parameter - the ETL::Pipeline object.
The output destination is opened before the input source, at the beginning of the ETL process.
Send a single record to the destination. The ETL process calls this method in a loop. It receives two parameters - the ETL::Pipeline object, and the current record as a Perl hash.
If your code encounters an error, write can call "error" in ETL::Pipeline with the error message. "error" in ETL::Pipeline automatically includes the record count with the error message. You should add any other troubleshooting information such as file names or key fields.
sub write { my ($self, $etl, $record) = @_; my $id = $record->{ID}; $etl->error( "Error message here for id $id" ); }
For fatal errors, I recommend using the croak command from Carp.
croak
ETL::Pipeline, ETL::Pipeline::Input, ETL::Pipeline::Output::Hash, ETL::Pipeline::Output::Perl, ETL::Pipeline::Output::UnitTest
Robert Wohlfarth <robert.j.wohlfarth@vumc.org>
Copyright 2021 (c) Vanderbilt University
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
To install ETL::Pipeline, copy and paste the appropriate command in to your terminal.
cpanm
cpanm ETL::Pipeline
CPAN shell
perl -MCPAN -e shell install ETL::Pipeline
For more information on module installation, please visit the detailed CPAN module installation guide.