The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Spreadsheet::XLSX::Reader::LibXML::GetCell - Top level xlsx Worksheet interface

SYNOPSIS

If you are looking for the synopsis for the package see "SYNOPSIS" in Spreadsheet::XLSX::Reader::LibXML. Otherwise the best example for use of this module alone is the test file in this package t/Spreadsheet/XLSX/Reader/LibXML/10-get_cell.t

DESCRIPTION

This documentation is written to explain ways to use this module when writing your own excel parser. To use the general package for excel parsing out of the box please review the documentation for Workbooks, Worksheets, and Cells

This is the extracted Role to be used as a top level worksheet interface. This is the place where all the various details in each sub XML sheet are coallated into a set of data representing all the necessary information for a requested cell. Since this is the center of data coallation all elements that may be customized should reside outside of this role. This includes any specific elements that would be different between each of the sheet parser types and any element of Excel data presentation that may lend itself to customization. For instance all the XML parser methods, (Reader, DOM, and possibly SAX) should exist outside and preferebly below this role.

This role is also contains a layer of abstraction to allow for run time setting of count-from-one or count-from-zero mode. The layer of abstraction is use with the Moose around modifier.

requires

These are method(s) used by this Role but not provided by the role. Any class consuming this role will not build without first providing these methods prior to loading this role. Since this is the center of data collation the list is long.

min_row

    Definition: Used to get the minimum row with data in the worksheet.

max_row

    Definition: Used to get the maximum row with data in the worksheet.

min_col

    Definition: Used to get the minimum column with data in the worksheet.

max_col

    Definition: Used to get the maximum column with data in the worksheet.

row_range

col_range

_get_next_value_cell

    Definition: This should return the next cell data from the worksheet file that contains unique formatting or information. The data is expected in a perl hash ref. This method should collect data left to right and top to bottom. The styles.xml, sharedStrings.xml, and calcChain.xml etc. sheet data are coallated into the cell information at this point. An 'EOF' string should be returned when the file has reached the end and then the method should wrap back to the beginning.

    Example of expected return data set

       {
          'r'          => 'A6',    # The cell ID
          'cell_merge' => 'A6:B6', # The merge range
          'row'        => 6,       # count by 1 (no 'around' performed on leading '_' methods)
          'col'        => 1,       # count by 1 (no 'around' performed on leading '_' methods)
          's'          => '11',    # Styles type (position 11 in the styles sheet)
          't'          => 's'      # Cell data type (string)
          'v' =>{                  # Cell data (since this cell is string 
             'raw_text' => '15'    # data this actually points to position 
          }                        #    15 in the sharedStrings.xml file )
       }

_get_next_cell

    Definition: Like _get_next_value_cell this method should return the next cell. The difference is it should return undef for empty cells rather than skipping them. This method should collect data left to right and top to bottom. The styles.xml, sharedStrings.xml, and calcChain.xml etc. sheet data are coallated into the cell information at this point. An 'EOF' string should be returned when the file has reached the end and then the method should wrap back to the beginning.

_get_col_row

    Definition: This method should provide a targeted way to return the worksheet file information on a cell. It should only accept count-from-one column and row numbers and the column should be required before the row. If the request is made for an out of row bounds position the method should provide an 'EOR' string. An 'EOF' string should be returned when the file has reached the end and then the method should wrap back to the beginning.

The attribute workbook_instance must also be filled correctly since it exports all of the the workbook level functionality to this class.

Primary Methods

These are the various methods provided by this role. Each of them calls a sub method to get the base cell data and then coallates that information into the proper return value(s) defined by "group_return_type" in Spreadsheet::XLSX::Reader::LibXML.

get_cell( $row, $column )

    Definition: This calls the supplied method _get_col_row.

    Accepts: the list ( $row, $column ) both required (and modified as needed by the attribute state of "count_from_zero" in Spreadsheet::XLSX::Reader::LibXML)

    Returns: if data to build a cell instance is provided then the instance is collated, built, and returned. Otherwise the value from '_get_col_row' is returned unfiltered.

get_next_value

    Definition: This calls the supplied method _get_next_value_cell

    Accepts: nothing

    Returns: if data to build a cell instance is provided then the instance is collated, built, and returned. Otherwise the value from '_get_next_value_cell' is returned unfiltered.

fetchrow_arrayref( $row )

    Definition: This calls the supplied method _get_row_all. It will return 'EOF' once instead of an array reference for the end of the file before resetting to the first row..

    Accepts: undef = next|$row = a row integer indicating the desired row (modified as needed by the attribute state of "count_from_zero" in Spreadsheet::XLSX::Reader::LibXML)

    Returns: an array ref of all possible column positions in that row with data filled in as appropriate. (or 'EOF')

fetchrow_array( $row )

    Definition: This function calls fetchrow_arrayref except it returns an array instead of an array ref

    Accepts: undef = next|$row = a row integer indicating the desired row

    Returns: an array of all possible column positions in that row with data filled in as appropriate.

set_headers( @header_row_list [ \&header_scrubber ] )

    Definition: This function is used to set headers used in the function fetchrow_hashref. It accepts a list of row numbers that will be collated into a set of headers used to build the hashref for each row. The header rows are coallated in sequence with the first number taking precedence. The list is also used to set the lowest row of the headers in the table. All rows at that level and higher will be considered out of the table and will return undef while setting the error instance. If some of the columns do not have values then the instance will auto generate unique headers for each empty header column to fill out the header ref. [ optionally: it is possible to pass a coderef to scrub the headers so they make some sence. for example; ]

            my $scrubber = sub{
                    my $input = $_[0];
                    $input =~ s/\n//g if $input;
                    $input =~ s/\s/_/g if $input;
                    return $input;
            };
            $self->set_headers( 2, 1, $scrubber ); # Called internally as $new_value = $scrubber->( $old_value );
            # Returns/stores the headers set at row 2 and 1 with values from row 2 taking precedence
            #  Then it scrubs the values by removing newlines and replacing spaces with underscores.

    Accepts: a list of row numbers (modified as needed by the attribute state of "count_from_zero" in Spreadsheet::XLSX::Reader::LibXML) and an optional closure .

    Returns: an array ref of the built headers for review

fetchrow_hashref( $row )

    Definition: This function is used to return a hashref representing the data in the specified row. If no $row value is passed it will return the 'next' row of data. A call to this function without setting the headers first will return undef and set the error instance. This function calls _get_row_all.

    Accepts: a target $row number for return values or undef meaning 'next'

    Returns: a hash ref of the values for that row

Attributes

Arguments that can be passed to new when creating a class instance or changed using one of the 'attribute methods'. Where an attribute is delegating the 'attribute method' from a method in the instance stored in the attribute the documentation will indicate that the 'attribute method' is 'delegated'. All 'delegated' methods are required for the instance to be accepted by the attribute. For more information on attributes see Moose::Manual::Attributes and Moose::Manual::Delegation.

last_header_row

    Definition: This is generally set by the method set_headers( @header_row_list ) method not during ->new and is the largest row number of the @header_row_list not necessarily the last number in the sequence.

    Default: undef

    attribute methods Methods provided to adjust this attribute

      get_last_header_row

        Definition: returns the value of the attribute

      has_last_header_row

        Definition: predicate for the attribute

min_header_col

    Definition: When the method fetchrow_hashref is called it is possible to only return a set of information between two defined columns. This is the attribute that defines the start column.

    Default: undef

    attribute methods Methods provided to adjust this attribute

      set_min_header_col

        Definition: sets the value of the attribute

        Range: integer values Integers less than min_col will be ignored

        get_min_header_col

          Definition: returns the value of the attribute

        has_min_header_col

          Definition: predicate for the attribute

        clear_min_header_col

          Definition: sets min_header_col to 'undef'

max_header_col

    Definition: When the method fetchrow_hashref is called it is possible to only collect a set of information between two defined columns. This is the attribute that defines the end column.

    Default: undef

    attribute methods Methods provided to adjust this attribute

      set_max_header_col

        Definition: sets the value of the attribute

        Range: integer values Integers larger than max_col will be ignored

        get_max_header_col

          Definition: returns the value of the attribute

        has_max_header_col

          Definition: predicate for the attribute

        clear_max_header_col

          Definition: sets min_header_col to 'undef'

custom_formats

    Definition: When this role is coallating data about a cell it will check this attribute before it checks the styles sheet to see if there is a format defined by the user for converting the unformatted data. The formats stored must have two methods 'assert_coerce' and 'display_name'. The cell instance builder will consult this attribute by first checking the cellID as a key, then it checks for just the column letter(s) as a key, and finally it checks the row number as a key. For an easy way to build custom conversion review the documentation for Type::Tiny and Type::Coercions. the Chained Coercions are very cool!.

    Default: undef

    attribute methods Methods provided to adjust this attribute

      set_custom_formats( { $key => $conversion } )

        Definition: a way to set all $key => $conversion pairs at once

        Accepts: a hashref of $key => $conversion pairs

        has_custom_format( $key )

          Definition: checks if the specific $key for a format is registered

        get_custom_format( $key )

          Definition: get the custom format for the requested $key

          Returns: the $conversion registered to the $key

        set_custom_format( $key => $conversion )

          Definition: set the custom format $conversion for the identified $key

workbook_instance

    Definition: This is where the workbook level methods are accessed by the worksheet. Because the workbook class is complex and I don't wan't to maintain duplicate documentation I request that you review the documentation for that class there. This attribute can/should only be set at ->new, however, it delegates to this class a number of methods that will update the workbook instance and therefore have universal effect when the other sheets are read.

    Default: none

    Required: yes

    attribute methods Methods provided to adjust this attribute

      counting_from_zero - delegated

      boundary_flag_setting - delegated

      change_boundary_flag - delegated

      _has_shared_strings_file - delegated

      get_shared_string_position - delegated

      _has_styles_file - delegated

      get_format_position - delegated

      set_empty_is_end - delegated

      is_empty_the_end - delegated

      _starts_at_the_edge - delegated

      get_group_return_type - delegated

      set_group_return_type - delegated

      get_epoch_year - delegated

      change_output_encoding - delegated

      get_date_behavior - delegated

      set_date_behavior - delegated

      get_empty_return_type - delegated

      set_error - delegated

      set_values_only - delegated

      get_values_only - delegated

SUPPORT

TODO

    1. Eliminate the min / max row / col calls from this role (and requireds) if possible.

AUTHOR

Jed Lund
jandrew@cpan.org

COPYRIGHT

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

The full text of the license can be found in the LICENSE file included with this module.

This software is copyrighted (c) 2014, 2015 by Jed Lund

DEPENDENCIES

SEE ALSO