The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Data::XLSX::Parser - faster XLSX parser

SYNOPSIS

    use Data::Dumper;
    use Data::XLSX::Parser;
    
    my $parser = Data::XLSX::Parser->new;
    $parser->add_row_event_handler(sub {
        my ($row, $rowDetail) = @_;
        # array of cell values in parsed row
        print Dumper $row;
        # array of hashes with cell details (reference, value, column, row, style, etc.) in parsed row
        print Dumper $rowDetail;
    });
    $parser->open('foo.xlsx');
    
    # parse sheet with sheet name
    $parser->sheet_by_rid( $parser->workbook->sheet_rid( 'Sheet1' ) );
    
    # .. or parse sheet with sheet Id
    $parser->sheet_by_id(1);
    
    # -----------
    # print values of all sheets on the commandline
    use Text::ASCIITable;
    
    # get names of all sheets in the workbook
    my @rows;
    
    my $xlsx_parser = Data::XLSX::Parser->new;
    $xlsx_parser->add_row_event_handler( sub{
        push @rows, $_[0];
    });
    
    $xlsx_parser->open( 'test.xlsx' );
    my @names = $xlsx_parser->workbook->names;
    
    for my $name ( @names ) {
        say "Table $name:";
    
        my $table = Text::ASCIITable->new;
        my $rid   = $xlsx_parser->workbook->sheet_id( $name );
        $xlsx_parser->sheet_by_rid( $rid );
    
        my $headers = shift @rows;
        $table->setCols( @{ $headers || [] } );
    
        for my $row ( @rows ) {
            $table->addRow( @{ $row || [] } );
        }
        
        print $table;
    
        @rows = ();
    }

DESCRIPTION

Data::XLSX::Parser provides a fast way to parse Microsoft Excel's .xlsx files. The implementation of this module is highly inspired from Python's FastXLSX library.

The module uses a SAX based parser, so you can parse very large XLSX file with lower memory usage.

METHODS

new

Create new parser object.

add_row_event_handler

Add sub reference to row handler. Two arguments are returned, the first is an array with the cell values of the parsed row, the second is an array of hashes with the details of the parsed row cells:

    |key |Content  
    -------------------------
    | i  |STYLE_INDEX        
    | s  |STYLE OF CELL      
    | f  |FORMAT OF CELL     
    | r  |REFERENCE          
    | c  |COLUMN OF CELL     
    | v  |VALUE OF CELL      
    | t  |TYPE OF CELL       
    | s  |TYPE_SHARED_STRING 
    | g  |GENERATED_CELL     
    | row|ROW OF CELL        

Cell values are returned 'as is', except date values (where the format tag indicates this) are converted to epoch values.

open

Open a workbook to be parsed.

sheet_by_id

Start parsing of sheet identified by sheet Id.

sheet_by_rid

Start parsing of sheet identified by sheet relation Id.

workbook

returns the Data::XLSX::Parser::Workbook object (representation of xl/workbook.xml, used to get sheets).

shared_strings

returns the Data::XLSX::Parser::SharedStrings object (representation of xl/sharedStrings.xml).

styles

returns the Data::XLSX::Parser::Styles object (representation of xl/styles.xml).

relationships

returns the Data::XLSX::Parser::Relationships object (representation of xl/_rels/workbook.xml.rels).

AUTHOR

Daisuke Murase <typester@cpan.org>