The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Spreadsheet::XLSX::Reader::XMLReader::SharedStrings - A LibXML::Reader sharedStrings base class

SYNOPSIS

        #!/usr/bin/env perl
        use Data::Dumper;
        use MooseX::ShortCut::BuildInstance qw( build_instance );
        use Spreadsheet::XLSX::Reader::LibXML::Error;
        use Spreadsheet::XLSX::Reader::LibXML::XMLReader::SharedStrings;

        my $file_instance = build_instance(
            package      => 'SharedStringsInstance',
            superclasses => ['Spreadsheet::XLSX::Reader::LibXML::XMLReader::SharedStrings'],
            file         => 'sharedStrings.xml',
            error_inst   => Spreadsheet::XLSX::Reader::LibXML::Error->new,
        );
        print Dumper( $file_instance->get_shared_string_position( 3 ) );
        print Dumper( $file_instance->get_shared_string_position( 12 ) );

        #######################################
        # SYNOPSIS Screen Output
        # 01: $VAR1 = {
        # 02:     'raw_text' => ' '
        # 03: };
        # 04: $VAR1 = {
        # 05:     'raw_text' => 'Superbowl Audibles'
        # 06: };
        #######################################
    

DESCRIPTION

This documentation is written to explain ways to use this module when writing your own excel parser or extending this package. To use the general package for excel parsing out of the box please review the documentation for Workbooks , Worksheets , and Cells.

This class is written to extend Spreadsheet::XLSX::Reader::LibXML::XMLReader. It addes to that functionality specifically to read the sharedStrings portion (if any) which is most likely a sub file zipped into an .xlsx file. It does not provide connection to other file types or even the elements from other files that are related to this file. This POD only describes the functionality incrementally provided by this module. For an overview of sharedStrings.xml reading see Spreadsheet::XLSX::Reader::LibXML::SharedStrings

Methods

These are the primary ways to use this class. For additional SharedStrings options see the Attributes section.

get_shared_string_position( $positive_int )

Attributes

Data passed to new when creating an instance of this class. For modification of these attributes see the listed 'attribute methods'. For more information on attributes see Moose::Manual::Attributes. The easiest way to modify these attributes are when a class instance is created and before it is passed to the workbook or parser.

cache_positions

    Definition: Especially for sheets with lots of stored text the parser can slow way down when accessing each postion. This is because an XML::LibXML Reader cannot rewind but must start from the beginning and index through the file till it gets to the target position. This is complicated by the fact that the shared strings are not necessarily stored in a logical or cell order. This is especially true for excel sheets that have experienced any significant level of manual intervention prior to being read. This attribute turns (default) on caching for shared strings so the parser only has to read through the shared strings once. When the read is complete all the way to the end it will also release the shared strings file in order to free up some space. (a small win in exchange for the space taken by the cache). The trade off here is that all intermediate shared strings are fully read before reading the target string. This means early reads will be slower. For sheets that only have numbers stored or at least have very few strings this will likely not be a large startup hit (or speed improvement). The risk obviously is that the cach will impact memory. You can use this attribute to turn off caching but it is most likely that a cache of that size will necessitate the sheet read to slow way down! The tradeoff of course is the parser shouldn't die. In order to minimize the physical size of the cache if there is only a text string stored in the shared strings position then only the string will be stored (not the definition that only a string exists).

    Default: 1 = caching is on

    Range: 1|0

    Attribute required: yes

    attribute methods Methods provided to adjust this attribute

no_formats

    Definition: Quite often the goal of reading a spreadsheet is to get at the data in the cells and not read the visible presentation of the sheet. If so reading the sharedStrings file can be sped up by skipping the stored text formatting when reading from the xml. This flag will manage that choice.

    Default: 0 = format reading is on

    Range: 0|1

    Attribute required: yes

    attribute methods Methods provided to adjust this attribute

SUPPORT

TODO

    1. Nothing yet

AUTHOR

    Jed Lund

    jandrew@cpan.org

COPYRIGHT

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

The full text of the license can be found in the LICENSE file included with this module.

This software is copyrighted (c) 2014, 2015 by Jed Lund

DEPENDENCIES

SEE ALSO