The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

DS - Data Stream module

SYNOPSIS

  use IO::Handle;
  use DS::Importer::TabFile;
  use DS::Transformer::TabStreamWriter;
  use DS::Target::Sink;
  
  $importer = new DS::Importer::TabFile( "$Bin/price_index.csv" );
  $printer  = new DS::Transformer::TabStreamWriter( 
      new_from_fd IO::Handle(fileno(STDOUT), 'w')
  );
  $printer->include_header;
  
  $importer->attach_target( $printer );
  $printer->attach_target( new DS::Target::Sink );
  
  $importer->execute();
  

DESCRIPTION

This package provides a framework for writing data processing components that work on typed streams. A typed stream in DS is a stream of hash references where every hashreference obeys certain constraints that is contained in a type specification.

BASIC CONCEPTS

The DSlib package draws upon a handful of concepts that are introduced here.

Base classes

The base classes in DSlib are:

DS::Source A source of a data stream. Sometime just called a "source".
DS::Target A target of a data stream. Sometime just called "target".
DS::Transformer A source and target mixin that receives a data stream and passes it on (with possible modifications).
DS::Importer A source that retrieves data from a source outside DS.

Processing chains

A processing chain is a linked list starting with a source, any number of following transformers and a target at the end of the list. An open processing chain is a chain where source or target is missing.

Processing chains work by having the source pass data down the chain until it eventually reaches the target, where the data goes out of DSlibs scope. The data is passed by having each transformer in the chain call the following transformer, passing the data as a parameter. The only data type supported is hash references.

End of stream convention

The data type supported by DS is hash references, but to indicate that there is no more rows in the stream, undef is used as an end of stream-marker.

It is vital that this marker is passed on by all components in the processing chain, since some components may need to clean up or pass on more rows at this point.

Type specifications

Any source, target or transformer can have ingoing oand outgoing types that can be used to ensure that the data passed to any target contains (but not limited to) a specified list of fields.

APIS SUBJECT TO CHANGE

I have decidede to pursue a more general way of writing transformers which will be available in version 3 of this package. I am certain that some APIs will be changed in a way that is not backwards compatible.

MISSING DOCUMENTATION

Some classes in this package are still without documentation. Send me a mail if you run into trouble or just want clarification of something. That may also encourage me to write the missing documentation.

SEE ALSO

DS::Importer::TabFile, DS::Importer::Sth, DS::Transformer, DS::Transformer::Sub, DS::Target::Sink.

AUTHOR

Written by Michael Zedeler.