NAME

Excel::ValueWriter::XLSX - generating data-only Excel workbooks in XLSX format, fast

SYNOPSIS

my $writer = Excel::ValueWriter::XLSX->new;
$writer->add_sheet($sheet_name1, $table_name1, [qw/a b tot/], [[1, 2, '=[a]+[b]'],
                                                               [3, 4],
                                                               ['TRUE', 'FALSE'],
                                                              ]);
$writer->add_sheet($sheet_name2, $table_name2, \@headers, sub {...});
$writer->add_sheet($sheet_name3, $table_name3, $sth);       # DBI statement handle
$writer->add_sheet($sheet_name4, $table_name4, $statement); # DBIx::DataModel::Statement object
$writer->add_sheets_from_database($dbh);
$writer->add_defined_name($name, $formula, $comment);
$writer->save_as($filename);

$writer = Excel::ValueWriter::XLSX->new(bool_regex => qr[^(?:(VRAI)|FAUX)$]);
$writer->add_sheet($sheet_name1, $table_name1, [qw/a b/], [['I like Perl:', 'VRAI']]);

DESCRIPTION

The present module is aimed at fast and cost-effective production of "data-only" Excel workbooks, containing nothing but plain values and formulas.

CPU and memory usage are much lower than with the well-known Excel::Writer::XLSX; however the set of features is also much more restricted : there is no support for formats, colors, figures or other fancy Excel features.

Such workbooks with plain data are useful for example :

in architectures where Excel is used merely as a local database, for example in connection with a Power Pivot architecture (for that purpose, see also the companion module Excel::PowerPivot::Utils);
for fine tuning of the ZIP compression level to be applied to the generated .xlsx file;
for generating files with large amounts of data (this was the original motivation for writing this module, because Excel::Writer::XLSX tends to become very slow on files with large numbers of rows).

METHODS

new

my $writer = Excel::ValueWriter::XLSX->new(%options);

Constructor for a new writer object. Options are :

date_regex

A compiled regular expression for detecting data cells that contain dates. The default implementation recognizes dates in dd.mm.yyyy, yyyy-mm-dd and mm/dd/yyyy formats. User-supplied regular expressions should use named captures so that the day, month and year values can be found respectively in $+{d}, $+{m} and $+{y}. If this parameter receives undef, no date handling will be performed.

bool_regex

A compiled regular expression for detecting data cells that contain boolean values. The default implementation recognizes uppercase strings 'TRUE' or 'FALSE' as booleans. User-supplied regular expressions should put the word corresponding to 'TRUE' within parenthesis so that the content is captured in $1. Here is an example for french :

$writer = Excel::ValueWriter::XLSX->new(bool_regex => qr[^(?:(VRAI)|FAUX)$]);

If this parameter receives undef, no bool handling will be performed.

compression_level

A number from 0 (no compression) to 9 (maximum compression) specifying the desired ZIP compression level. High values produce smaller files but consume more CPU. The default is taken from COMPRESSION_LEVEL_DEFAULT in Archive::Zip, which amounts to 6.

add_sheet

$writer->add_sheet($sheet_name, $table_name[, $headers], $rows_maker[, \%options]);

Adds a new worksheet into the workbook.

The $sheet_name is mandatory; it must be unique and between 1 and 31 characters long.
If $table_name is not undef, the sheet contents will be registered as an Excel table of that name. Excel tables offer more features than regular ranges of cells, so generally it is a good idea to always assign a table name. Table names must be unique, of minimum 3 characters, without spaces or special characters. Technically table names could be equal to sheet names, but this is not recommended because it may create confusion if the data is later to be referred to from Power Query or from Power Pivot.
The $headers argument is optional; it may be undef or may even be absent. If present, it should contain an arrayref of scalar values, that will be used as column names for the table associated with that worksheet. Column names should be unique (otherwise Excel will automatically add a discriminating number). If $headers are not present, the first row produced by $rows_maker will be treated as headers.
The $rows_maker argument may be:
- a reference to a 2-dimensional array of values
- a reference to a callback function that will return a new row at each call, in the form of a 1-dimensional array reference. An empty return from the callback function signals the end of data (but intermediate empty rows may be returned as []). Callback functions should typically be closures over a lexical variable used to decide when the last row has been met. Here is an example of a callback function used to feed a sheet with 500 lines of 300 columns of random numbers:
```
my @headers_for_rand = map {"h$_"} 1 .. 300;
my $random_rows = do {my $count = 500; sub {$count-- > 0 ? [map {rand()} 1 .. 300] : undef}};
$writer->add_sheet(RAND_SHEET => rand => \@headers_for_rand, $random_rows);
```
- an executed DBI statement handle
- a DBIx::DataModel::Statement object (version 3.0 or greater)
\%options is an optional reference to a hash of additional options. Currently the only option is cols, which points to an arrayref of column specifications. The arrayref may contain :
- a list of numbers specifying the width of successive columns, like for example :
```
$writer->add_sheet(S_sales => sales => [qw/date product_id descr price/], $rows, {cols => [12, 8, 20, 8]);
```
  Column widths in Excel are expressed in "character units", i.e. a requested width of 8 should be just wide enough to display the string '00000000' (8 zero characters displayed with the normal font).
- a list of hashrefs specifying column attributes. The width attribute corresponds to column widths, as explained above. The style attribute is a numeric index into the list of styles defined within the workbook -- except that currently this module only supports one single style for dates, so this addition has no practical use at the moment.

Cells within a row must contain scalar values. Values that look like numbers are treated as numbers. String values that match the date_regex are converted into numbers and displayed through a date format, but only if the date is valid and is above 1900 January 1st -- otherwise it is treated as a string, like in Excel. String values that start with an initial '=' are treated as formulas; but like in Excel, if you want a plain string that starts with a '=', put a single quote just before the '=' -- that single quote will be removed from the string. Everything else is treated as a string. Strings are shared at the workbook level (hence a string that appears several times in the input data will be stored only once within the workbook).

add_sheets_from_database

$writer->add_sheets_from_database($dbh, $sheet_prefix, @table_names);

Gets data from database tables and adds them as sheets into the Excel workbook. Arguments are :

$dbh: An active DBI database handle
$sheet_prefix: A string that will be prepended at the beginning of each worksheet name, so that they are different from names of Excel tables. The default is 'S.'.
@table_names: The list of tables to be read from the database. If empty, table names are retrieved automatically from the database through the "table_info" in DBI method.

add_defined_name

$writer->add_defined_name($name, $formula, $comment);

Adds a "defined name" to the workbook. Defined names can be used in any formula within the workbook, and will be replaced by the corresponding content.

$name is mandatory and must be unique
$formula is mandatory and will be interpreted by Excel like a formula. References to ranges should include the sheet name and use absolute coordinates; for example for concatenating two cells in sheet 's1', the formula is :
```
$writer->add_defined_name(cells_1_and_2 => q{'s1'!$A$1&'s1'!$A$2});
```
If the intended content is just a constant string, it must be enclosed in double quotes, i.e.
```
$writer->add_defined_name(my_string => q{"my_constant_value"});
```
$comment is optional; it will appear when users consult the name manager in the Formulas tab.

save_as

$writer->save_as($target);

Writes the workbook contents into the specified $target, which can be either a filename or filehandle opened for writing.

ARCHITECTURAL NOTES

Object-orientedness

Although I'm a big fan of Moose and its variants, the present module is implemented in POPO (Plain Old Perl Object) : since the aim is to maximize cost-effectiveness, and since the object model is extremely simple, there was no ground for using a sophisticated object system.

Benchmarks

I did a couple of measurements that demonstrate the effectiveness of this module on large datasets; but the results cannot be included into this distribution yet, because the tests are not sufficiently formalized and they are based on on data that is not public.

To do

- options for workbook properties : author, etc.
- support for 1904 date schema
- reproducible benchmarks

AUTHOR

Laurent Dami, <dami at cpan.org>

COPYRIGHT AND LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

To install Excel::ValueWriter::XLSX, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Excel::ValueWriter::XLSX

CPAN shell

perl -MCPAN -e shell
install Excel::ValueWriter::XLSX

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)