The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Iterator::Records - a simple iterator for arrayref record sources

VERSION

Version 0.02

SYNOPSIS

Iterator::Records uses Iterator::Simple to work with iterators whose values are arrayrefs of named fields. These can be called record streams. A record stream can be seen as the same thing as a DBI retrieval, but without most of the machinery for DBI - and of course, a DBI query is one of the ways you can build a record stream.

The actual API of Iterator::Records isn't as simple or elegant as Iterator::Simple, simply because there's more to keep track of. But the basic approach is similar: an Iterator::Records object defines how to iterate something, then you use the iter() method to create an iterator from it. The result is an Iterator::Simple iterator known to return records, i.e. arrayrefs of fields that match the field list specified.

Note that the Iterator::Records object is an iterator *factory*, and the actual iterator itself is returned by the call to iter().

  use Iterator::Records;
  
  my $spec = Iterator::Records->new (<something iterable>, ['field 1', 'field 2']);
  
  my $iterator = $spec->iter();
  while (my $row = $iterator->()) {
     my ($field1, $field2) = @$row;
  }
  
  $iterator = $spec->iter_hash();
  while (my $row = $iterator->()) {
     print $row->{field 1};
  }
  
  my ($f1, $f2);
  $iterator = $spec->iter_bind(\$f1, \$f2);
  while ($iterator->()) {
     print "$f1 - $f2\n";
  }

Note that the iterator itself is just an Iterator::Simple iterator. Now hold on, though, because here's where things get interesting.

  my $recsource = Iterator::Records->new (sub { ... }, ['field 1', 'field 2']);
  my $iterator = $recsource->select ("field 1")->iter;
  while (my $row = $iterator->()) {
     my ($field1) = @$row;
  }
  
  my @fields = $recsource->fields();
  my $fields = $recsource->fields(); # Returns an arrayref in scalar context.
  
  $rs = $recsource->where (sub { ... }, "field 1", "field 2");
  $rs = $recsource->fixup ("field 1", sub { ... } );
  $rs = $recsource->calc  ("field 3", sub { ... } );
  $rs = $rs->select ("field 2", "field 3", "field 1");
  $rs = $rs->select (["field 2", "field 3", "field 1"]);
  
  $rs = $recsource->transform (["where", ["field 1", "=", "x"]],
                               ["fixup", ["field 1, sub { ... }]]);

Since Iterator::Records is essentially a more generalized way of iterating DBI results, there are a few wrappers to make things easy.

  my $dbh = Iterator::Records::db->connect(--DBI syntax--);
  my $dbh = Iterator::Records::db->open('sqlite file');
  my $dbh = Iterator::Records::db->open(); # Defaults to an in-memory SQLite database
  

This is not the direct DBI handle; it's got simplified syntax as follows:

  my $value = $dbh->get ('select value from table where id=?', $id);  # Single value retrieval in one whack.
  $dbh->do ("insert ...");  # Regular insertion, just like in DBI, except simpler.
  my $record = $dbh->insert ("insert ..."); # Calls last_insert_id ('', '', '', ''), which will likely fail except with SQLite.

And then you have the actual iterator machinery.

  my $iter = $dbh->iterator ('select * from table')->iter();
  my $sth = $dbh->prepare (--DBI syntax--);
  my $iter = $sth->iter ($value1, $value2);
  while ($iter->()) {
     my ($field1, $field2) = @$_;
  }
  

We can load an iterator into a table. If you have Data::Tab installed, it will make a Data::Tab with the column names from this iterator. Otherwise, it will simply return an arrayref of arrayrefs by calling Iterator::Simple's "list" method.

  my $data = $recsource->table;
  

The "report" method returns an Iterator::Simple that applies an sprintf to each value in the record source. If you supply a list of fields to dedupe it will replace them with "" if their value is the same as the previous row. This is useful for tabulated data where, for instance, the date may be the same from line to line and if so should only be displayed once.

  my $report = $recsource->report ("%-20s %s", ["field 1"]); # Here, field 2 would not be deduped.
  my $report = join ('\n', $recsource->report (...)->list);

BASIC ITERATION

new (iterable, arrayref of fields)

To specify an Iterator::Records from scratch, just take whatever iterable thing you have, and specify a list of fields in the resulting records. If the iterable is anything but a coderef, Iterator::Records->iter will simply pass it straight to Iterator::Simple for iteration. If it's a coderef, it will be called, and its return value will be passed to Iterator::Simple. This allows record streams to be reused.

As an added bonus of this extra level of indirection, you can call "iter" with parameters that will be passed on to the coderef. This turns the Iterator::Records object into a parameterizable iterator factory.

iter, iter_hash, iter_bind

Basic iteration of the record source returns an arrayref for each record. Alternatively, an iterator can be created which returns a hashref for each record, with the field names keying the return values in each record. This is less efficient, but it's often handy. The third option is to bind a list of scalar references that will be written automagically on each retrieval. The return value in this case is still the original arrayref record.

TRANSMOGRIFIERS

Since our record stream sources are very often provided by fairly simple drivers (like the filesystem walker in File::Org), it's not at all unusual to find ourselves in a position where we want to modify them on the fly, either filtering out some of the records or modifying the records as they go through. There are four different "transmogrifiers" for record streams: where, select, calc, and fixup. The "where" transmogrifier discards records that don't match a particular pattern; "select" removes columns; "calc" adds a column that is calculated by an arbitrary coderef provided; and "fixup" applies a coderef to the record to modify individual field values.

Each transmogrifier takes an iterator specification, not an iterator - and returns a new specification that can be iterated. The source stream will then be iterated internally.

where (sub { ... }, 'field 1', 'field 2')

Filtration of records is not really any different from igrep - given a record stream, we provide a coderef that tells us to include or not to include. If fields are specified, their values for the record to be examined will be passed to the coderef as its parameters; otherwise the entire record is provided as an arrayref and the coderef can extract values on its own. The list of fields is not affected.

select ('field 1', 'field 3')

Returns a spec for a new iterator that includes only the fields listed, in the order listed.

calc ('new field', sub { ... }, 'field 1', 'field 2')

Returns a spec for a new iterator that includes a new field calculated by the coderef provided; as for "where", if fields are listed they will be passed into the coderef as parameters, but otherwise the entire record will be passed in. The new field will appear at the end of the current field list.

fixup (sub { ... })

Returns a spec for a new iterator in which each record is first visited by the coderef provided. This is just an imap in more record-based form. The field list is unchanged.

dedupe ('field 1', 'field 2')

Keeps track of the last values for field 1 and field 2; if the new value is a duplicate, passes an empty string through instead. Useful for reporting. The field list is unchanged.

rename ('field 1', 'new name', [more pairs])

To rename a field (or more than one), use 'rename'. The record is not changed.

transmogrify (['where', ...], ['calc', ...])

Any sequence of transmogrifiers can be chained together in a single step using the transmogrify method.

LOADING AND REPORTING

These are some handy utilities for dealing with record streams.

load ([limit]), load_parms(parms...), load_lparms(limit, parms...), load_iter(iterator, [limit])

The load function simply loads the stream into an arrayref of arrayrefs. If limit is specified, at most that many rows will be loaded; otherwise, the iterator runs as long as it has data.

Note that this is called directly on the definition of the stream, not on the resulting iterator. Consequently, load can't be used to "page" through an existing record stream - if you want to do that, you should look at Data::Tab, which was written specifically to support the buffered reading of record streams and manipulation of the resulting buffers.

This form of load can't be used on iterator factories that take parameters. If you have a factory that requires parameters, use load_parms. Finally, to use both a limit and parameters, use load_lparms.

All of these are just sugar for the core method load_iter, which, given a started iterator and an optional limit, loads it.

report (format, [dedupe list])

The report method is another retrieval method; that is, it returns an iterator when called. However, this iterator is not a record stream; instead, it is a string iterator. Each record in the defined stream is passed through sprintf with the format provided. For convenience, if a list of columns is provided, performs a dedupe transmogrification on the incoming records before formatting them.

table ([limit]), table_parms(parms...), table_lparms(limit, parms...), table_iter(iterator, [limit])

The table functions work just like the load functions, but load the iterator into a Data::Org::Table, if that module is installed.

open ([filename])

The open method opens an SQLite database file. Opens an in-memory file if no filename is provided.

connect(...)

The connect method is just the DBI connect method; we get it via inheritance.

get (query, [parms])

The get method takes some SQL, executes it with the parameters passed in (if any), retrieves the first row, and returns the value of the first field of that row.

select

The select method retrieves an array of arrayrefs for the rows returned from the query. In scalar mode, returns the arrayref from fetchall_arrayref.

select_one

The select_one method runs a query and returns the first row as an arrayref.

iterator (query, [parms), itparms (query, fields)

This is the actual reason for putting this into the Iterator::Records namespace - given a query against the database, we return an iterator factory for iterators over the rows of the query. Like select, the basic iterator call will assemble a query and execute it. It will then ask DBI for the names of the fields in the query and use that information to build an Iterator::Records object that, when iterated, will return the query results. If iterated again, it will run a new query.

If you want to have parameterized queries instead, use itparms, then pass parameters to the factory it creates. In this case, since the query can't be run in advance, you have to provide the field names you expect. (They don't have to match the ones the database will give you, though, in this case.)

insert

The insert command calls last_insert_id after the insertion, and returns that value. Just a little shorthand. Since retrieval of the ID for the last row inserted is very database-specific, it may not work for your particular configuration.

load_table (table, iterator), load_sql (insert query, iterator)

For bulk loading, we have single-call methods load_table and load_sql. The former will build an appropriate insert query for the table in question using the iterator's field list. The second takes an arbitrary insert query, then executes it on each record coming from the iterator. This method can take either an Iterator::Records object, or any coderef or activated iterator that returns arrayrefs; if given the latter it will simply pass them to the execute call.

Each returns the number of rows inserted.

do

The do command works a little differently from the standard API; DBI's version wants a hashref of attributes that I never use and regularly screw up.

AUTHOR

Michael Roberts, <michael at cpan.org>

BUGS

Please report any bugs or feature requests to bug-Iterator-Records at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Iterator-Records. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Iterator::Records

You can also look for information at:

ACKNOWLEDGEMENTS

LICENSE AND COPYRIGHT

Copyright 2015 Michael Roberts.

This program is free software; you can redistribute it and/or modify it under the terms of the the Artistic License (2.0). You may obtain a copy of the full license at:

http://www.perlfoundation.org/artistic_license_2_0

Any use, modification, and distribution of the Standard or Modified Versions is governed by this Artistic License. By using, modifying or distributing the Package, you accept this license. Do not use, modify, or distribute the Package, if you do not accept this license.

If your Modified Version has been derived from a Modified Version made by someone other than you, you are nevertheless required to ensure that your Modified Version complies with the requirements of this license.

This license does not grant you the right to use any trademark, service mark, tradename, or logo of the Copyright Holder.

This license includes the non-exclusive, worldwide, free-of-charge patent license to make, have made, use, offer to sell, sell, import and otherwise transfer the Package with respect to any patent claims licensable by the Copyright Holder that are necessarily infringed by the Package. If you institute patent litigation (including a cross-claim or counterclaim) against any party alleging that the Package constitutes direct or contributory patent infringement, then this Artistic License to you shall terminate on the date that such litigation is filed.

Disclaimer of Warranty: THE PACKAGE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS "AS IS' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES. THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT ARE DISCLAIMED TO THE EXTENT PERMITTED BY YOUR LOCAL LAW. UNLESS REQUIRED BY LAW, NO COPYRIGHT HOLDER OR CONTRIBUTOR WILL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THE PACKAGE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.