The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

ETL::Pipeline::Output - Role for ETL::Pipeline output destinations

SYNOPSIS

  use Moose;
  with 'ETL::Pipeline::Output';

  sub open {
    # Add code to open the output destination
    ...
  }
  sub write {
    # Add code to save your data here
    ...
  }
  sub close {
    # Add code to close the destination
    ...
  }

DESCRIPTION

An output destination fulfills the load part of ETL. This is where the data ends up. These are the outputs of the process.

A destination can be anything - database, file, or anything. Destinations are customized to your environment. And you will probably only have a few.

ETL::Pipeline interacts with the output destination is 3 stages...

1. Open - connect to the database, open the file, whatever setup is appropriate for your destination.
2. Write - called once per record. This is the part that actually performs the output.
3. Close - finished processing and cleanly shut down the destination.

This role sets the requirements for these 3 methods. It should be consumed by all output destination classes. ETL::Pipeline relies on the destination having this role.

How do I create an output destination?

ETL::Pipeline provides a couple generic output destinations as exmaples or for very simple uses. The real value of ETL::Pipeline comes from adding your own, business specific, destinations...

1. Start a new Perl module. I recommend putting it in the ETL::Pipeline::Output namespace. ETL::Pipeline will pick it up automatically.
2. Make your module a Moose class - use Moose;.
3. Consume this role - with 'ETL::Pipeline::Output';.
4. Write the "open", "close", and "write" methods.
5. Add any attributes for your class.

The new destination is ready to use, like this...

  $etl->output( 'YourNewDestination' );

You can leave off the leading ETL::Pipeline::Output::.

When ETL::Pipeline calls "open" or "close", it passes the ETL::Pipeline object as the only parameter. When ETL::Pipeline calls "write", it passed two parameters - the ETL::Pipeline object and the record. The record is a Perl hash.

Example destinations

ETL::Pipeline comes with a couple of generic output destinations...

ETL::Pipeline::Output::Hash

Stores records in a Perl hash. Useful for loading support files and tying them together later.

ETL::Pipeline::Output::Perl

Executes a subroutine against the record. Useful for debugging data issues.

Why this way?

My work involves a small number of destinations that rarely change and a greater number of sources that do change. So I designed ETL::Pipeline to minimize time writing new input sources. The trade off was slightly more complex output destinations.

Upgrading from older versions

ETL::Pipeline version 3 is not compatible with output destinations from older versions. You will need to rewrite your custom output destinations.

Change the configure to "open".
Change finish to "close".
Change write_record to "write".
Remove set and new_record. All records are Perl hashes.
Adjust attributes as necessary.

METHODS & ATTRIBUTES

close

Shut down the ouput destination. This method may close files, disconnect from the database, or anything else required to cleanly terminate the output.

close receives one parameter - the ETL::Pipeline object.

The output destination is closed after the input source, at the end of the ETL process.

open

Prepare the output destination for use. It can open files, make database connections, or anything else required to access the destination.

open receives one parameter - the ETL::Pipeline object.

The output destination is opened before the input source, at the beginning of the ETL process.

write

Send a single record to the destination. The ETL process calls this method in a loop. It receives two parameters - the ETL::Pipeline object, and the current record as a Perl hash.

If your code encounters an error, write can call "error" in ETL::Pipeline with the error message. "error" in ETL::Pipeline automatically includes the record count with the error message. You should add any other troubleshooting information such as file names or key fields.

  sub write {
    my ($self, $etl, $record) = @_;
    my $id = $record->{ID};
    $etl->error( "Error message here for id $id" );
  }

For fatal errors, I recommend using the croak command from Carp.

SEE ALSO

ETL::Pipeline, ETL::Pipeline::Input, ETL::Pipeline::Output::Hash, ETL::Pipeline::Output::Perl, ETL::Pipeline::Output::UnitTest

AUTHOR

Robert Wohlfarth <robert.j.wohlfarth@vumc.org>

LICENSE

Copyright 2021 (c) Vanderbilt University

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.