The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Spreadsheet::XLSX::Reader::LibXML::ParseExcelFormatStrings - Parser of XLSX format strings

SYNOPSYS

See the "SYNOPSYS" in Spreadsheet::XLSX::Reader::LibXML::FmtDefault

DESCRIPTION

To use the general package for excel parsing out of the box please review the documentation for Workbooks , Worksheets , and Cells

This is a general purpose Moose Role that will convert Excel format strings into Type::Tiny objects in order to implement the conversion defined by the format string. Excel defines the format strings as number conversions only (They do not act on text). Excel format strings can have up to four parts separated by semi-colons. The four parts are positive, zero, negative, and text. In Excel the text section is just a pass through. This is how excel handles dates earlier than 1900sh. This parser deviates from that for dates. Since this parser parses dates into a DateTime objects (and then potentially back to a differently formatted string) it also attempts to parse strings to DateTime objects if the cell has a date format applied. All other types of Excel number conversions still treat strings as a pass through.

To replace this module just build a Moose::Role that delivers the method parse_excel_format_string and get_defined_conversion. Then use it when building a replacement for Spreadsheet::XLSX::Reader::LibXML::FmtDefault.

The decimal (real number) to fractions conversion can be top heavy to build. If you are experiencing delays when reading values then this is another place to investigate. In order to get the most accurate answer this parser initially uses the continued fraction algorythm to calculate a possible fraction for the pased $decimal value with the setting of 20 max iterations and a maximum denominator width defined by the format string. If that does not resolve satisfactorily it then calculates an over/under numerator with decreasing denominators from the maximum denominator (based on the format string) all the way to the denominator of 2 and takes the most accurate result. There is no early-out set in this computation so if you reach this point for multi digit denominators it is computationally intensive. (Not that continued fractions are computationally so cheap.). However, doing the calculation this way generally yields the same result as Excel. In some few cases the result is more accurate. I was unable to duplicate the results from Excel exactly (or even come close otherwise). If you have a faster conversion then implemenation of the speed-up can be acheived by substituting the fraction coercion using "set_custom_formats( { $key => $conversion } )" in Spreadsheet::XLSX::Reader::LibXML::GetCell

requires

These are method(s) used by this role but not provided by the role. Any class consuming this role will not build without first providing these methods prior to loading this role.

get_excel_region

set_error

    Definition: Used to set the error string in a shared error instance.

get_defined_excel_format

Primary Methods

These are the primary ways to use this Role. For additional ParseExcelFormatStrings options see the Attributes section.

parse_excel_format_string( $string, $name )

    Definition: This is the method to convert Excel format strings into Type::Tiny objects with built in coercions. The type coercion objects are then used to convert unformatted values into formatted values using the assert_coerce method. Coercions built by this module allow for the format string to have up to four parts separated by semi-colons. These four parts correlate to four different data input ranges. The four parts are positive, zero, negative, and text. If three substrings are sent then the data input is split to (positive and zero), negative, and text. If two input types are sent the data input is split between numbers and text. One input type is a take all comers type with the exception of dates. When dates are built by this module it always adds a possible from-text conversion to process Excel pre-1900ish dates. This is because Excel does not record dates prior to 1900ish as numbers. All date unformatted values are then processed into and then potentially back out of DateTime objects. This requires "Chained Coercions" in Type::Tiny::Manual::Coercions. The two packages used for conversion to DateTime objects are DateTime::Format::Flexible and DateTimeX::Format::Excel.

    Accepts: an Excel number format string and a conversion name stored in the Type::Tiny object. This package will auto-generate a name if none is given

    Returns: a Type::Tiny object with type coercions and pre-filters set for each input type from the formatting string

    Delegated to the workbook class: yes

get_defined_conversion( $position )

Attributes

Data passed to new when creating the Spreadsheet::XLSX::Reader::LibXML::FmtDefault instance. For modification of these attributes see the listed 'attribute methods'. For more information on attributes see Moose::Manual::Attributes. Most of these are not exposed to the top level of Spreadsheet::XLSX::Reader::LibXML.

epoch_year

    Definition: This is the epoch year in the Excel sheet. It differentiates between Windows and Apple Excel implementations. For more information see DateTimeX::Format::Excel. It is generally (re)set by the workbook when the formatter instance is passed to the workbook.

    Default: 1900

    Range: 1900 or 1904

    attribute methods Methods provided to adjust this attribute

      get_epoch_year

        Definition: returns the value of the attribute

        Delegated to the workbook class: no

      set_epoch_year

        Definition: sets the value of the attribute

        Delegated to the workbook class: no

datetime_dates

    Definition: It may be that you desire the full DateTime object as output rather than the finalized datestring when converting unformatted date data to formatted date data. This attribute sets whether data coersions are built to do the full conversion or just to a DateTime object level. It is generally (re)set by the workbook when the formatter instance is passed to the workbook.

    Default: 0 = unformatted values are coerced completely to date strings (1 = stop at DateTime)

    attribute methods Methods provided to adjust this attribute.

      get_date_behavior

        Definition: returns the value of the attribute

        Delegated to the workbook class: yes

      set_date_behavior( $Bool )

        Definition: sets the attribute value (only new coercions are affected)

        Accepts: Boolean values

        Delegated to the workbook class: yes

cache_formats

    Definition: In order to save re-building the coercion each time they are used, the built coercions can be cached with the format string as the key. This attribute sets whether caching is turned on or not.

    Default: 1 = caching is on

    attribute methods Methods provided to adjust this attribute

      get_cache_behavior

        Definition: returns the value of the attribute

        Delegated to the workbook class: inherited

      set_cache_behavior

        Definition: sets the value of the attribute

        Range: Boolean 1 = cache formats, 0 = Don't cache formats

        Delegated to the workbook class: inherited

european_first

    Definition: This is a way to check for DD-MM-YY formatting of string dates prior to checking for MM-DD-YY. Since this checks both ways the goal is to catch ambiguous data where the substring for DD < 13 and assign it correctly.

    Default: 0 = MM-DD-YY is tested first

    attribute methods Methods provided to adjust this attribute

      get_european_first

        Definition: returns the value of the attribute

      set_european_first

        Definition: sets the value of the attribute

        Range: Boolean 0 = MM-DD-YY is tested first, 1 = DD-MM-YY is tested first

        Delegated to the workbook class: yes

SUPPORT

TODO

    1. Attempt to merge _split_decimal_integer and _integer_and_decimal

AUTHOR

Jed Lund
jandrew@cpan.org

COPYRIGHT

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

The full text of the license can be found in the LICENSE file included with this module.

This software is copyrighted (c) 2014, 2015 by Jed Lund

DEPENDENCIES

SEE ALSO