- SEE ALSO
- COPYRIGHT AND LICENSE
Spreadsheet::Read::Ingester - ingest and save csv and spreadsheet data to a perl data structure to avoid reparsing
use Spreadsheet::Read::Ingester; # ingest raw file, store parsed data file, and return data object my $data = Spreadsheet::Read::Ingester->new('/path/to/file'); # the returned data object has all the methods of a L<Spreadsheet::Read> object my $num_cols = $data->sheet(1)->maxcol; # delete old data files older than 30 days to save disk space Spreadsheet::Read::Ingester->cleanup;
This module is intended to be a drop-in replacement for Spreadsheet::Read and is a simple, unobtrusive wrapper for it.
Parsing spreadsheet and csv data files is time consuming, especially with large data sets. If a data file is ingested more than once, much time and processing power is wasted reparsing the same data. To avoid reparsing, this module uses Storable to save a parsed version of the data to disk when a new file is ingested. All subsequent ingestions are retrieved from the stored Perl data structure. Files are saved in the directory determined by File::UserConfig and is a function of the user's OS.
The stored data file names are the unique file signatures for the raw data file. The signature is used to detect if the original file changed, in which case the data is reingested from the raw file and a new parsed file is saved using an updated file signature. Arguments passed to the constructor are appended to the name of the file to ensure different parse options are accounted for. Parsed data files are kept indefinitely but can be deleted with the
Consult the Spreadsheet::Read documentation for accessing the data object returned by this module.
my $data = Spreadsheet::Read::Ingester->new('/path/to/file');
Deletes all stored files from the user's application data directory. Takes an optional argument indicating the minimum number of days old the file must be before it is deleted. Defaults to 30 days. Passing a value of 0 deletes all files.
You can find documentation for this module with the perldoc command.
The following websites have more information about this module, and may be of help to you. As always, in addition to those websites please use your favorite search engine to discover more resources.
A modern, open-source CPAN search engine, useful to view POD in HTML format.
The code is open to the world, and available for you to hack on. Please feel free to browse it and play with it, or whatever. If you want to contribute patches, please send me a diff or prod me to pull from your repository :)
git clone git://github.com/sdondley/Spreadsheet-Read-Ingester.git
Please report any bugs or feature requests on the bugtracker website https://github.com/sdondley/Spreadsheet-Read-Ingester/issues
When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.
If a new parser is installed (e.g. Text::CSV_XS) and a previous ingestion used a different parser (e.g. Text::CSV_PP), results from the previous parser will be returned. Most likely, this will have no practical consequence. But if you are concerned, you can avoid the problem by specifying the same parser using an environment variable per the Spreadsheet::Read documentation:
env SPREADSHEET_READ_CSV=Text::CSV_PP ...
Similarly, upgrading to a newer version of a parser can cause the same problem. Currently, the only workaround is to delete the stored data files parsed with the old older parser version.
See perlmodinstall for information and options on installing Perl modules.
Steve Dondley <email@example.com>
This software is copyright (c) 2019 by Steve Dondley.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.