The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Iterator::Flex::Manual::Basics - Iterator Basics

VERSION

version 0.12

DESCRIPTION

Introduction

An iterator is something which encapsulates a source of data and parcels it out one chunk at a time. Iterators usually need to keep track of the state of the data stream and which chunk they should next return.

For example, imagine iterating through an array, returning one element at a time. The state required would be the array and the index of the next element to return. Here's a simple iterator which uses a hash to keep track of state

  sub iterate ( $state ) {
     my $array = $state->{array};
     return $state{index} > $#$array ? undef : $array->[$state{index}++];
  }

We could use this via:

   my %state  = ( array => [ 0.. 20 ], index => 0 );

  while ( defined( my $value = iterate( \%state ) ) ) {
    say $value;
  }

This illustrates the three typical phases of an iterator:

  1. Initialized: The iterator's state has been set up.

  2. Iteration: The iterator has returned at least one element of data, but may not know if there are more.

  3. Exhaustion: The iterator has definitely run out of data.

(There's a fourth state, Error.)

Returning Data and Signaling Exhaustion or Error

Iterator Exhaustion

Exhaustion is traditionally signaled via:

  1. Returning a sentinel value;

  2. Throwing an exception.

  3. Setting a Boolean predicate in a multi-valued return, e.g.

     { value => $value, success => $bool }

There's no right way to do it, just different trade-offs; see Iterator::Flex::Manual::PriorArt for how other languages and Perl modules handle it.

Returning a sentinel value is often good enough, but only if that value doesn't exist in your data stream. In our example iterator, it returns undef when it has exhausted the data source. However, imagine that the array contains temperature measurements taken at uniform intervals; an undef value may indicate that there was a problem taking a measurement (similar to how one would use null in a database), e.g.

    my @Temp = ( 22, 23.2, undef, 24, ... );

The iterator itself happily keeps going until it runs out of data, but when it returns the undef value, our example code above interprets it as the iterator signaling exhaustion and will stop querying the iterator. Obviously that's wrong.

One option is to use a value that knowingly can't occur. If your temperature is measured in Kelvin, which is always positive, a negative value can be a sentinel. However, that requires that the sentinel value be an input parameter to the iterator.

Iterator::Flex provides a signal_exhaustion method which currently supports either returning a user defined sentinel or throwing an exception.

Iterator Error

Similar issues arise when the iterator must signal an error. For example, if the iterator retrieves from a database and there is a connection issue, the client code must be alerted. This can be done via any of the methods specified in "Iterator Exhaustion".

Most implementations (language or Perl modules) don't provide an explicit specification of how to handle this. Iterator::Flex provides a signal_error method which currently supports throwing an exception.

Iterator Capabilities

Apart from state, an iterator is mostly defined by its capabilities. The only one required is "next", which retrieves a value,

There are a limited set of additional capabilities which are not appropriate to all data sources or iterators, so they are optional.

Some capabilities can be emulated by iterator adapters. The supported capabilities are documented in Iterator::Flex::Manual::Overview, and are

Iterator Generators

An iterator generator creates an iterator from a data source, which may be real (such as a data structure in memory, a database, etc.), or virtual (such as a sequence of numbers). Iterator::Flex provides iterator generators via convenience wrappers and classes for: arrays (iarray, Iterator::Flex::Array), numeric sequences (iseq, Iterator::Flex::Sequence), array like objects (Iterator::Flex::ArrayLike).

For others, writing an iterator is straightforward; see Iterator::Flex::Manual::Authoring.

Iterator Adapters

An iterator adapter acts as a filter or modifier on the output of another iterator. Applying an adapter to an iterator results in another iterator, which can be used as input to another adapter.

Iterator::Flex provides adapters both via convenience wrappers and classes for

grep

igrep, Iterator::Flex::Grep)

map

imap, Iterator::Flex::Map

cycle

icycle, Iterator::Flex::Cycle

Cartesian product

iproduct, Iterator::Flex::Product

Caching/Buffering

icache, Iterator::Flex::Cache

Continuous Serialization

ifreeze, Iterator::Flex::Freeze

Iterator Wrappers

There are a number of existing iterator packages on CPAN (see Iterator::Flex::Manual::PriorArt). Iterator::Flex can wrap those iterators so that they can be used within the Iterator::Flex framework. See Iterator::Flex::Manual::Alien.

SUPPORT

Bugs

Please report any bugs or feature requests to bug-iterator-flex@rt.cpan.org or through the web interface at: https://rt.cpan.org/Public/Dist/Display.html?Name=Iterator-Flex

Source

Source is available at

  https://gitlab.com/djerius/iterator-flex

and may be cloned from

  https://gitlab.com/djerius/iterator-flex.git

SEE ALSO

Please see those modules/websites for more information related to this module.

AUTHOR

Diab Jerius <djerius@cpan.org>

COPYRIGHT AND LICENSE

This software is Copyright (c) 2018 by Smithsonian Astrophysical Observatory.

This is free software, licensed under:

  The GNU General Public License, Version 3, June 2007