The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Spreadsheet::Read::Ingester - ingest and save csv and spreadsheet data to a perl data structure to avoid reparsing

SYNOPSIS

  use Spreadsheet::Read::Ingester;

  # ingest raw file, store parsed data file, and return data object
  my $data = Spreadsheet::Read::Ingester->new('/path/to/file');

  # the returned data object has all the methods of a L<Spreadsheet::Read> object
  my $num_cols = $data->sheet(1)->maxcol;

  # delete old data files older than 30 days to save disk space
  Spreadsheet::Read::Ingester->cleanup;

DESCRIPTION

This module is intended to be a drop-in replacement for Spreadsheet::Read and is a simple, unobtrusive wrapper for it.

Parsing spreadsheet and csv data files is time consuming, especially with large data sets. If a data file is ingested more than once, much time and processing power is wasted reparsing the same data. To avoid reparsing, this module uses Storable to save a parsed version of the data to disk when a new file is ingested. All subsequent ingestions are retrieved from the stored Perl data structure. Files are saved in the directory determined by File::UserConfig and is a function of the user's OS.

The stored data file names are the unique file signatures for the raw data file. The signature is used to detect if the original file changed, in which case the data is reingested from the raw file and a new parsed file is saved using an updated file signature. Arguments passed to the constructor are appended to the name of the file to ensure different parse options are accounted for. Parsed data files are kept indefinitely but can be deleted with the cleanup() method.

Consult the Spreadsheet::Read documentation for accessing the data object returned by this module.

METHODS

new( $path_to_file )

  my $data = Spreadsheet::Read::Ingester->new('/path/to/file');

Takes same arguments as the new constructor in Spreadsheet::Read module. Returns an object identical to the object returned by the Spreadsheet::Read module along with its corresponding methods.

cleanup( $file_age_in_days )

cleanup()

  Spreadsheet::Read::Ingester->cleanup(0);

Deletes all stored files from the user's application data directory. Takes an optional argument indicating the minimum number of days old the file must be before it is deleted. Defaults to 30 days. Passing a value of 0 deletes all files.

REQUIRES

SUPPORT

Perldoc

You can find documentation for this module with the perldoc command.

  perldoc Spreadsheet::Read::Ingester

Websites

The following websites have more information about this module, and may be of help to you. As always, in addition to those websites please use your favorite search engine to discover more resources.

Source Code

The code is open to the world, and available for you to hack on. Please feel free to browse it and play with it, or whatever. If you want to contribute patches, please send me a diff or prod me to pull from your repository :)

https://github.com/sdondley/Spreadsheet-Read-Ingester

  git clone git://github.com/sdondley/Spreadsheet-Read-Ingester.git

BUGS

Please report any bugs or feature requests on the bugtracker website https://github.com/sdondley/Spreadsheet-Read-Ingester/issues

When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.

LIMITATIONS

If a new parser is installed (e.g. Text::CSV_XS) and a previous ingestion used a different parser (e.g. Text::CSV_PP), results from the previous parser will be returned. Most likely, this will have no practical consequence. But if you are concerned, you can avoid the problem by specifying the same parser using an environment variable per the Spreadsheet::Read documentation:

  env SPREADSHEET_READ_CSV=Text::CSV_PP ...

Similarly, upgrading to a newer version of a parser can cause the same problem. Currently, the only workaround is to delete the stored data files parsed with the old older parser version.

INSTALLATION

See perlmodinstall for information and options on installing Perl modules.

SEE ALSO

Spreadsheet::Read

AUTHOR

Steve Dondley <s@dondley.com>

COPYRIGHT AND LICENSE

This software is copyright (c) 2019 by Steve Dondley.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.