NAME

Spreadsheet::XLSX::Reader::LibXML::Worksheet - Top level XLSX::Reader Worksheet interface

SYNOPSIS

        use strict;
        use warnings;
        use Data::Dumper;
        
        use Spreadsheet::XLSX::Reader::LibXML;
        my $workbook =  Spreadsheet::XLSX::Reader::LibXML->new( #similar style to Spreadsheet::XLSX
                                                file_name => 't/test_files/TestBook.xlsx',# in the test folder of this package
                                                group_return_type => 'value',
                                        );

        if ( !$workbook->has_file_name ) {
                die $workbook->error(), ".\n";
        }

        my      $worksheet = $workbook->worksheet( 'Sheet5' );
                $worksheet->set_custom_formats( {
                        2 =>'yyyy-mm-dd',
                } );
        my $value;
        while( !$value or $value ne 'EOF' ){
                $value = $worksheet->fetchrow_arrayref;
                print Dumper( $value );
        }

        ###########################
        # SYNOPSIS Output
        # $VAR1 = [ 'Superbowl Audibles', 'Column Labels' ];
        # $VAR1 = [         'Row Labels',     2016-02-06', '2017-02-14', '2018-02-03', 'Grand Total' ];
        # $VAR1 = [               'Blue',            '10',          '7',           '',          '17' ];
        # $VAR1 = [              'Omaha',              '',           '',          '2',           '2' ];
        # $VAR1 = [                'Red',            '30',          '5',          '3',          '38' ];
        # $VAR1 = [        'Grand Total',            '40',         '12',          '5',          '57' ];
        # $VAR1 = 'EOF';
        ###########################

The best example for use of this module alone is the test file in this package t/Spreadsheet/XLSX/Reader/LibXML/10-worksheet.t

DESCRIPTION

This documentation is intended to cover all 'tabular' data worksheets. Even if they contain embedded charts. If the tab is a 'chartsheet' then please review the documentation for Chartsheets.

The worksheet class provided by this package is an amalgam of a class, a few roles, and a few traits aggregated at run time based on attribute settings from the workbook level class. This documentation shows the ways to use the resulting instance. First, it is best to generate a worksheet instance from the workbook class using one of the various worksheet methods. Once you have done that there are several ways to step through the data inside of each worksheet and access information from the identified location in the sheet of the .xlsx file.

For information on how to leverage this role and the other roles and classes I use for building your own worksheet parser please review the list of method requirements in DEPENDENCIES that are specific to this role and the documentation for the classes and roles I use to build a worksheet instance; Spreadsheet::XLSX::Reader::LibXML::XMLReader, Spreadsheet::XLSX::Reader::LibXML::XMLReader::WorksheetToRow, and MooseX::ShortCut::BuildInstance. The file t/Spreadsheet/XLSX/Reader/LibXML/10-worksheet.t in the distribution represents a good 'under the hood' example of the way all the elements are integrated into the larger worksheet class as a whole.

Information filtering

There is a an attribute set in the workbook instance called group_return_type . Setting this attribute will return either a full Spreadsheet::XLSX::Reader::LibXML::Cell instance, the raw xml value, just the unformatted value, or the formatted value. For more details on the data available in the Cell instance read the documentation for the Cell instance. Each lower level of information requirement will under certain circumstances provide increased speed since the parser will not be required to coallate that level of processing to the cell.

Methods

These are the various functions that are available (independent of tabualar sheet parser type) to select which cell(s) to read. When allowed the requested row and column numbers are interpreted using the attribute "count_from_zero" in Spreadsheet::XLSX::Reader::LibXML. all the methods are object methods performed on the worksheet.

Example:

        my $cell_data = $worksheet->get_cell( $row, $column );

get_cell( $row, $column )

Definition: Indicate both the requested row and requested column and the information for that position will be returned. Both $row and $column are required

Accepts: the list ( $row, $column ) both required See the attribute "count_from_zero" in Spreadsheet::XLSX::Reader::LibXML to understand which row and column are returned for $row and $colum.

Returns: see the attribute "group_return_type" in Spreadsheet::XLSX::Reader::LibXML for details on what is returned

get_next_value

Definition: Reading left to right and top to bottom this will return the next cell with a value. This actually includes cells with no value but some unique formatting such as cells that have been merged with other cells.

Accepts: nothing

Returns: see the attribute "group_return_type" in Spreadsheet::XLSX::Reader::LibXML for details on what is returned

fetchrow_arrayref( $row )

Definition: In an homage to DBI I included this function to return an array ref of the cells or values in the requested $row. If no row is requested this returns the 'next' row. In the array ref any empty and non unique cell will show as 'undef'.

Accepts: undef = next|$row = a row integer indicating the desired row See the attribute "count_from_zero" in Spreadsheet::XLSX::Reader::LibXML to understand which row is returned for $row.

Returns: an array ref of all possible column positions in that row with data filled in per the attribute "group_return_type" in Spreadsheet::XLSX::Reader::LibXML.

fetchrow_array( $row )

Definition: This function is just like fetchrow_arrayref except it returns an array instead of an array ref

Accepts: undef = next|$row = a row integer indicating the desired row. See the attribute "count_from_zero" in Spreadsheet::XLSX::Reader::LibXML to understand which row is returned for $row.

Returns: an array ref of all possible column positions in that row with data filled in per the attribute "group_return_type" in Spreadsheet::XLSX::Reader::LibXML.

set_headers( @header_row_list [ \&header_scrubber ] )

Definition: This function is used to set headers used in the function fetchrow_hashref. It accepts a list of row numbers that will be collated into a set of headers used to build the hashref for each row. The header rows are coallated in sequence with the first number taking precedence. The list is also used to set the lowest row of the headers in the table. All rows at that level and higher will be considered out of the table and will return undef while setting the error instance. If some of the columns do not have values then the instance will auto generate unique headers for each empty header column to fill out the header ref. [ optionally: it is possible to pass a coderef to scrub the headers so they make some sence. for example; ]

        my $scrubber = sub{
                my $input = $_[0];
                $input =~ s/\n//g if $input;
                $input =~ s/\s/_/g if $input;
                return $input;
        };
        $self->set_headers( 2, 1, $scrubber ); # Called internally as $new_value = $scrubber->( $old_value );
        # Returns/stores the headers set at row 2 and 1 with values from row 2 taking precedence
        #  Then it scrubs the values by removing newlines and replacing spaces with underscores.

Accepts: a list of row numbers (modified as needed by the attribute state of "count_from_zero" in Spreadsheet::XLSX::Reader::LibXML) and an optional closure . See the attribute "count_from_zero" in Spreadsheet::XLSX::Reader::LibXML to understand which rows are used when the @header_row_list is called.

Returns: an array ref of the built headers for review.

fetchrow_hashref( $row )

Definition: This function is used to return a hashref representing the data in the specified row. If no $row value is passed it will return the 'next' row of data. A call to this function without setting the headers first will return 'undef' and set the error instance.

Accepts: a target $row number for return values or undef meaning 'next' See the attribute "count_from_zero" in Spreadsheet::XLSX::Reader::LibXML to understand which rows are targeted by $row.

Definition: This returns a list containing the minimum column number followed by the maximum column number. This list is affected by the workbook attributes from_the_edge, and count_from_zero

Accepts: nothing

Returns: ( $minimum_column, $maximum_column )

get_merged_areas

Definition: This method returns an array ref of cells that are merged. This method does respond to the attribute "count_from_zero" in Spreadsheet::XLSX::Reader::LibXML Warning: This result is extracted from the sheet metadata, however if your sheet has been damaged or 'adjusted' by non-microsoft code (This is more common than you would think in the data processing world) then the return list may be wrong or missing when the sheet is first opened. The goal of this package is to minimize memory consumption so it will learn what the correct list is over the first pass through the sheet as you collect data but it does not attempt to validate this list in detail initially. If you have an idea of the range for a damaged sheet before you open it you can use EOR flags. Otherwise the methods get_next_value or fetchrow_arrayref are recomended. For missing values the minimum is set to the first column and the maximum is set to undef.

Accepts: nothing

Returns: An arrayref of arrayrefs of merged areas or undef if no merged areas

        [ [ $start_row_1, $start_col_1, $end_row_1, $end_col_1], etc.. ]

is_sheet_hidden

Definition: Method indicates if the excel program would hide the sheet or show it if the file were opened in the Microsoft Excel application

Accepts: nothing

Returns: a boolean value indicating if the sheet is hidden or not 1 = hidden

is_column_hidden

Definition: Method indicates if the excel program would hide the identified column(s) or show it|them if the file were opened in the Microsoft Excel application. If more than one column is passed then it returns true if any of the columns are hidden in scalar context and a list of 1 and 0 values for each of the requested positions in array (list) context. This method (input) does respond to the attribute "count_from_zero" in Spreadsheet::XLSX::Reader::LibXML. Unlike the method 'is_row_hidden' this method will always 'know' the correct answer since the information is stored outside of the dataa table in the xml file.

Accepts: integer values or column letter values selecting the columns in question

Returns: in scalar context it returns a boolean value indicating if any of the requested columns would be hidden by Excel. In array/list context it returns a list of boolean values for each requested column indicating it's hidden state for Excel. (1 = hidden)

is_row_hidden

Definition: Method indicates if the excel program would hide the identified row(s) or show it|them if the file were opened in the Microsoft Excel application. If more than one row is passed then it returns true if any of the rows are hidden in scalar context and a list of 1 and 0 values for each of the requested positions in array (list) context. This method (input) does respond to the attribute "count_from_zero" in Spreadsheet::XLSX::Reader::LibXML. Warning: This method will only be accurate after the user has read at least one cell from or past the row inspected for it's hidden state. This allows the sheet to avoid reading all the way through once before starting the cell parsing.

Accepts: integer values selecting the rows in question

Returns: in scalar context it returns a boolean value indicating if any of the requested rows would be hidden by Excel. In array/list context it returns a list of boolean values for each requested row indicating it's hidden state for Excel. (1 = hidden)

Attributes

These are attributes that affect the behaviour of the returned data in the worksheet instance. In general you would not set these on instance generation, Because the primary class will generate this instance for you. Rather you would use the attribue methods listed with each attribute to change the attribute after the worksheet instance has been generated. Additionally at the end of this list that a reference to the workbook is stored in one of the attributes as well so many workbook settings can be adjusted from the worksheet instance..

min_header_col

Definition: This attribute affects the hashref that is returned in the method fetchrow_hashref. This attribute tells fetchrow_hashref what column to use to start the hash ref build. This attribute (input) does respond to the attribute "count_from_zero" in Spreadsheet::XLSX::Reader::LibXML.

Default: undef (which is equivalent to the minimum column of the sheet)

Range: The minimum column of the sheet to or less than the max_header_col

attribute methods Methods provided to adjust this attribute

get_min_header_col

Definition: returns the value stored in the attribute

set_min_header_col

Definition: Sets a new value for the attribute

has_min_header_col

Definition: Indicates if the attribute has a stored value

max_header_col

Definition: This attribute affects the hashref that is returned in the method fetchrow_hashref. This attribute tells fetchrow_hashref what column to use to end the hash ref build. This attribute (input) does respond to the attribute "count_from_zero" in Spreadsheet::XLSX::Reader::LibXML.

Default: undef (equal to the maximum column of the sheet)

Range: The maximum column of the sheet to or less than the min_header_col

attribute methods Methods provided to adjust this attribute

get_max_header_col

Definition: returns the value stored in the attribute

set_max_header_col

Definition: Sets a new value for the attribute

has_max_header_col

Definition: Indicates if the attribute has a stored value

custom_formats

Definition: This package will generate value conversions that generally match the numerical conversions set in the Excel spreadsheet. However, it may be that you want to convert the unformatted values for certain cells, rows, or columns in some user defined way. The simplest way to do this is by storing an Excel custom number format string in this attribute using 'set_custom_formats' against either a CellID a Row Number or a Column letter. As an example you could say;

        my $worksheet = $workbook->worksheet( 'TargetWorksheetName' );
        $worksheet->set_custom_formats( {
            A => '# ?/?',
        } );

And any subsequent call for a $cell->value from column 'A' will attempt to convert the contents of that cell to a fraction with on position in the denominator or less. If the cell is text then it will act as a pass-through.

For the truly adventurous you can build an object instance that has the two following methods; 'assert_coerce' and 'display_name'. Then add it to the attribute as above.

Default: {} = no custom conversions

Range: keys representing cell ID's, row numbers, or column letter callouts followed by values that are instance references for specific conversions. This does not follow the workbook count_from_zero conversions since it would create a disconnect between actual CellID's and Columns from the parser.

A Complicated Example:

Building a converter on the fly (or use Type::Tiny or MooseX::Types)

        use DateTimeX::Format::Excel;
        use DateTime::Format::Flexible;
        use Type::Coercion;
        use Type::Tiny;
        my @args_list  = ( system_type => 'apple_excel' );
        my $num_converter  = DateTimeX::Format::Excel->new( @args_list );
        
        # build conversion subroutines (number and strings to DateTime objects)
        my $string_via = sub{ 
              my $str = $_[0];
              return DateTime::Format::Flexible->parse_datetime( $str );
        };
        my $num_via     = sub{
              my $num = $_[0];
              return $num_converter->parse_datetime( $num );
        };
        
        # Combine conversion subroutines into a coercion object! 
        #  (Note numbers are attempted first)
        my $date_time_from_value = Type::Coercion->new( 
                type_coercion_map => [ Num, $num_via, Str, $string_via, ],
        );
        
        # Install the coercion in a type that ensures it passes through a DateTime check
        $date_time_type = Type::Tiny->new(
           name       => 'Custom_date_type',
           constraint => sub{ ref($_) eq 'DateTime' },
           coercion   => $date_time_from_value,
        );
        
        # Chained coercions! to handle first the $date_time_from_value coercion 
        #    and then build a specific date string output
        $string_type = Type::Tiny->new(
           name       => 'YYYYMMDD',
           constraint => sub{
              !$_ or (
                 $_ =~ /^\d{4}\-(\d{2})-(\d{2})$/ and
                 $1 > 0 and $1 < 13 and $2 > 0 and $2 < 32 
              )
           },
           coercion => Type::Coercion->new(
           type_coercion_map =>[
              $date_time_type->coercibles, sub{ 
                 my $tmp = $date_time_type->coerce( $_ );
                 $tmp->format_cldr( 'yyyy-MM-dd' )
              },
           ],
        ), );

Setting custom conversions to use for the worksheet

        my $worksheet = $workbook->worksheet( 'TargetWorksheetName' );
        $worksheet->set_custom_formats( {
            E10 => $date_time_type,
            10  => $string_type,
            D14 => $string_type,
        } );

attribute methods Methods provided to adjust this attribute

set_custom_formats

Definition: Sets a new (complete) hashref for the attribute

has_custom_format( $key )

Definition: checks if the specific custom format $key is set

set_custom_format( $key => $coercion, ... )

Definition: sets the specific custom format $key(s) with $coercion(s)

get_custom_format( $key )

Definition: returns the specific custom format for that $key (see has_custom_format )

sheet_rel_id

Definition: This is the relId of the sheet listed in the XML of the .xlsx file. You probably don't care and you should never set this value.

attribute methods Methods provided to adjust this attribute

rel_id

Definition: returns the value stored in the attribute

sheet_id

Definition: This is the Id of the sheet listed in the XML of the .xlsx file. I beleive this to be the number used in vbscript to reference the sheet. You should never set this value.

attribute methods Methods provided to adjust this attribute

sheet_id

Definition: returns the value stored in the attribute

sheet_position

Definition: This is the visual sheet position in the .xlsx file. You should never set this value.

attribute methods Methods provided to adjust this attribute

position

Definition: returns the value stored in the attribute

sheet_name

Definition: This is the visual sheet name in the .xlsx file. You should never set this value.

attribute methods Methods provided to adjust this attribute

get_name

Definition: returns the value stored in the attribute

sheet_type

Definition: There are two possible kinds of sheets in an Excel file; 'worksheets' and 'chartsheets' if you are not sure what kind of sheet you have this is where the information is stored.

attribute methods Methods provided to adjust this attribute

get_sheet_type

Definition: returns the value stored in the attribute (worksheet)

workbook_instance

Definition: This attribute holds a reference back to the workbook instance so that the worksheet has access to the global settings managed there. As a consequence many of the workbook methods are be exposed here. This includes some setter methods for workbook attributes. Beware that setting or adjusting the workbook level attributes with methods here will be universal and affect other worksheets. So don't forget to return the old value if you want the old behavour after you are done. If that doesn't make sense then don't use these methods. (Nothing to see here! Move along.)

Default: a Spreadsheet::XLSX::Reader::LibXML instance

attribute methods Methods of the workbook exposed here by the delegation of the instance to this class through this attribute

counting_from_zero

Definition: returns the "count_from_zero" in Spreadsheet::XLSX::Reader::LibXML instance state

boundary_flag_setting

Definition: returns the "file_boundary_flags" in Spreadsheet::XLSX::Reader::LibXML instance state

change_boundary_flag( $Bool )

Definition: sets the "file_boundary_flags" in Spreadsheet::XLSX::Reader::LibXML instance state (For the whole workbook!)

get_shared_string_position( $int )

Definition: returns the shared string data stored in the sharedStrings file at position $int. For more information review Spreadsheet::XLSX::Reader::LibXML::SharedStrings. This is a delegation of a delegation!

get_format_position( $int, [$header] )

Definition: returns the format data stored in the styles file at position $int. If the optional $header is passed only the data for that header is returned. Otherwise all styles for that position are returned. For more information review Spreadsheet::XLSX::Reader::LibXML::Styles. This is a delegation of a delegation!

set_empty_is_end( $Bool )

Definition: sets the "empty_is_end" in Spreadsheet::XLSX::Reader::LibXML instance state (For the whole workbook!)

is_empty_the_end

Definition: returns the "empty_is_end" in Spreadsheet::XLSX::Reader::LibXML instance state.

get_group_return_type

Definition: returns the "group_return_type" in Spreadsheet::XLSX::Reader::LibXML instance state.

set_group_return_type( (instance|unformatted|value) )

Definition: sets the "group_return_type" in Spreadsheet::XLSX::Reader::LibXML instance state (For the whole workbook!)

get_epoch_year

Definition: uses the "get_epoch_year" in Spreadsheet::XLSX::Reader::LibXML method.

get_date_behavior

Definition: This is a delegated method from the styles class (stored as a private instance in the workbook). It is held (and documented) in the Spreadsheet::XLSX::Reader::LibXML::ParseExcelFormatStrings role. It will indicate how far unformatted transformation is carried for date coercions when returning formatted values.

set_date_behavior

Definition: This is a delegated method from the styles class (stored as a private instance in the workbook). It is held (and documented) in the Spreadsheet::XLSX::Reader::LibXML::ParseExcelFormatStrings role. It will set how far unformatted transformation is carried for date coercions when returning formatted values.

get_values_only

Definition: gets the "values_only" in Spreadsheet::XLSX::Reader::LibXML instance state.

set_values_only

Definition: sets the "values_only" in Spreadsheet::XLSX::Reader::LibXML instance state (For the whole workbook!)