The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Spreadsheet::ParseExcel - Get information from Excel file

SYNOPSIS

    use strict;
    use Spreadsheet::ParseExcel;
    my $oExcel = new Spreadsheet::ParseExcel;

    #1.1 Normal Excel97
    my $oBook = $oExcel->Parse('Excel/Test97.xls');
    my($iR, $iC, $oWkS, $oWkC);
    print "FILE  :", $oBook->{File} , "\n";
    print "COUNT :", $oBook->{SheetCount} , "\n";
    print "AUTHOR:", $oBook->{Author} , "\n";
    for(my $iSheet=0; $iSheet < $oBook->{SheetCount} ; $iSheet++) {
        $oWkS = $oBook->{Worksheet}[$iSheet];
        print "--------- SHEET:", $oWkS->{Name}, "\n";
        for(my $iR = $oWkS->{MinRow} ; 
                defined $oWkS->{MaxRow} && $iR <= $oWkS->{MaxRow} ; $iR++) {
            for(my $iC = $oWkS->{MinCol} ;
                            defined $oWkS->{MaxCol} && $iC <= $oWkS->{MaxCol} ; $iC++) {
                $oWkC = $oWkS->{Cells}[$iR][$iC];
                print "( $iR , $iC ) =>", $oWkC->Value, "\n" if($oWkC);  # Formatted Value
                print "( $iR , $iC ) =>", $oWkC->{Val}, "\n" if($oWkC);  # Original Value
            }
        }
    }

new interface

    use strict;
    use Spreadsheet::ParseExcel;
    my $oBook = 
        Spreadsheet::ParseExcel::Workbook->Parse('Excel/Test97.xls');
    my($iR, $iC, $oWkS, $oWkC);
    foreach my $oWkS (@{$oBook->{Worksheet}}) {
        print "--------- SHEET:", $oWkS->{Name}, "\n";
        for(my $iR = $oWkS->{MinRow} ; 
                defined $oWkS->{MaxRow} && $iR <= $oWkS->{MaxRow} ; $iR++) {
            for(my $iC = $oWkS->{MinCol} ;
                            defined $oWkS->{MaxCol} && $iC <= $oWkS->{MaxCol} ; $iC++) {
                $oWkC = $oWkS->{Cells}[$iR][$iC];
                print "( $iR , $iC ) =>", $oWkC->Value, "\n" if($oWkC);
            }
        }
    }

DESCRIPTION

Spreadsheet::ParseExcel makes you to get information from Excel95, Excel97, Excel2000 file.

Functions

new

$oExcel = new Spreadsheet::ParseExcel( [ CellHandler => \&subCellHandler, NotSetCell => undef | 1, ]);

Constructor.

CellHandler (experimental)

specify callback function when a cell is detected.

subCellHandler gets arguments like below:

sub subCellHandler ($oBook, $iSheet, $iRow, $iCol, $oCell);

CAUTION : The atributes of Workbook may not be complete. This function will be called almost order by rows and columns. Take care almost, not perfectly.

NotSetCell (experimental)

specify set or not cell values to Workbook object.

Parse

$oWorkbook = $oParse->Parse($sFileName [, $oFmt]);

return "Workbook" object. if error occurs, returns undef.

$sFileName

name of the file to parse

From 0.12 (with OLE::Storage_Lite v.0.06), scalar reference of file contents (ex. \$sBuff) or IO::Handle object (inclucdng IO::File etc.) are also available.

$oFmt

"Formatter Class" to format the value of cells.

ColorIdxToRGB

$sRGB = $oParse->ColorIdxToRGB($iColorIdx);

ColorIdxToRGB returns RGB string corresponding to specified color index. RGB string has 6 charcters, representing RGB hex value. (ex. red = 'FF0000')

Workbook

Spreadsheet::ParseExcel::Workbook

Workbook class has these methods :

Parse

(class method) : same as Spreadsheet::ParseExcel

Worksheet

$oWorksheet = $oBook->Worksheet($sName);

Worksheet returns a Worksheet object with $sName or undef. If there is no worksheet with $sName and $sName contains only digits, it returns a Worksheet object at that position.

Workbook class has these properties :

File

Name of the file

Author

Author of the file

Flag1904

If this flag is on, date of the file count from 1904.

Version

Version of the file

SheetCount

Numbers of "Worksheet" s in that Workbook

Worksheet[SheetNo]

Array of "Worksheet"s class

PrintArea[SheetNo]

Array of PrintArea array refs.

Each PrintArea is : [ StartRow, StartColumn, EndRow, EndColumn]

PrintTitle[SheetNo]

Array of PrintTitle hash refs.

Each PrintTitle is : { Row => [StartRow, EndRow], Column => [StartColumn, EndColumn]}

Worksheet

Spreadsheet::ParseExcel::Worksheet

Worksheet class has these methods:

Cell ( ROW, COL )

Return the Cell iobject at row ROW and column COL if it is defined. Otherwise return undef.

RowRange ()

Return a two-element list (MIN, MAX) containing the minimum and maximum of defined rows in the worksheet If there is no row defined MAX is smaller than MIN.

ColRange ()

Return a two-element list (MIN, MAX) containing the minimum and maximum of defined columns in the worksheet If there is no row defined MAX is smaller than MIN.

Worksheet class has these properties:

Name

Name of that Worksheet

DefRowHeight

Default height of rows

DefColWidth

Default width of columns

RowHeight[Row]

Array of row height

ColWidth[Col]

Array of column width (undef means DefColWidth)

Cells[Row][Col]

Array of "Cell"s infomation in the worksheet

Landscape

Print in horizontal(0) or vertical (1).

Scale

Print scale.

FitWidth

Number of pages with fit in width.

FitHeight

Number of pages with fit in height.

PageFit

Print with fit (or not).

PaperSize

Papar size. The value is like below:

  Letter               1, LetterSmall          2, Tabloid              3 ,
  Ledger               4, Legal                5, Statement            6 ,
  Executive            7, A3                   8, A4                   9 ,
  A4Small             10, A5                  11, B4                  12 ,
  B5                  13, Folio               14, Quarto              15 ,
  10x14               16, 11x17               17, Note                18 ,
  Envelope9           19, Envelope10          20, Envelope11          21 ,
  Envelope12          22, Envelope14          23, Csheet              24 ,
  Dsheet              25, Esheet              26, EnvelopeDL          27 ,
  EnvelopeC5          28, EnvelopeC3          29, EnvelopeC4          30 ,
  EnvelopeC6          31, EnvelopeC65         32, EnvelopeB4          33 ,
  EnvelopeB5          34, EnvelopeB6          35, EnvelopeItaly       36 ,
  EnvelopeMonarch     37, EnvelopePersonal    38, FanfoldUS           39 ,
  FanfoldStdGerman    40, FanfoldLegalGerman  41, User                256
PageStart

Start page number.

UsePage

Use own start page number (or not).

LeftMergin, RightMergin, TopMergin, BottomMergin, HeaderMergin, FooterMergin

Mergins for left, right, top, bottom, header and footer.

HCenter

Print in horizontal center (or not)

VCenter

Print in vertical center (or not)

Content of print header. Please refer Excel Help.

Content of print footer. Please refer Excel Help.

PrintGrid

Print with Gridlines (or not)

PrintHeaders

Print with headings (or not)

NoColor

Print in black-white (or not).

Draft

Print in draft mode (or not).

Notes

Print with notes (or not).

LeftToRight

Print left to right(0) or top to down(1).

HPageBreak

Array ref of horizontal page breaks.

VPageBreak

Array ref of vertical page breaks.

MergedArea

Array ref of merged areas. Each merged area is : [ StartRow, StartColumn, EndRow, EndColumn]

Cell

Spreadsheet::ParseExcel::Cell

Cell class has these properties:

Value

Method Formatted value of that cell

Val

Original Value of that cell

Type

Kind of that cell ('Text', 'Numeric', 'Date')

Code

Character code of that cell (undef, 'ucs2', '_native_') undef tells that cell seems to be ascii. '_native_' tells that cell seems to be 'sjis' or something like that.

Format

"Format" for that cell.

Merged

That cells is merged (or not).

Rich

Array ref of font informations about each characters.

Each entry has : [ Start Position, Font Object]

For more information please refer sample/dmpExR.pl

Format

Spreadsheet::ParseExcel::Format

Format class has these properties:

Font

"Font" object for that Format.

AlignH

Horizontal Alignment.

  0: (standard), 1: left,       2: center,     3: right,      
  4: fill ,      5: justify,    7:equal_space  

Notice: 6 may be merge but it seems not to work.

AlignV

Vertical Alignment.

    0: top,  1: vcenter, 2: bottom, 3: vjustify, 4: vequal_space
Indent

Number of indent

Wrap

Wrap (or not).

Shrink

Display in shrinking (or not)

Rotate

In Excel97, 2000 : degrees of string rotation. In Excel95 or earlier : 0: No rotation, 1: Top down, 2: 90 degrees anti-clockwise, 3: 90 clockwise

JustLast

JustLast (or not). I have never seen this attribute.

ReadDir

Direction for read.

BdrStyle

Array ref of boder styles : [Left, Right, Top, Bottom]

BdrColor

Array ref of boder color indexes : [Left, Right, Top, Bottom]

BdrDiag

Array ref of diag boder kind, style and color index : [Kind, Style, Color] Kind : 0: None, 1: Right-Down, 2:Right-Up, 3:Both

Fill

Array ref of fill pattern and color indexes : [Pattern, Front Color, Back Color]

Lock

Locked (or not).

Hidden

Hiddedn (or not).

Style

Style format (or Cell format)

Font

Spreadsheet::ParseExcel::Font

Format class has these properties:

Name

Name of that font.

Bold

Bold (or not).

Italic

Italic (or not).

Height

Size (height) of that font.

Underline

Underline (or not).

UnderlineStyle

0: None, 1: Single, 2: Double, 0x21: Single(Account), 0x22: Double(Account)

Color

Color index for that font.

Strikeout

Strikeout (or not).

Super

0: None, 1: Upper, 2: Lower

Formatter class

Spreadsheet::ParseExcel::Fmt*

Formatter class will convert cell data.

Spreadsheet::ParseExcel includes 2 formatter classes: FmtDefault and FmtJapanese. You can create your own FmtClass as you like.

Formatter class(Spreadsheet::ParseExcel::Fmt*) should provide these functions:

ChkType($oSelf, $iNumeric, $iFmtIdx)

tells type of the cell that has specified value.

$oSelf

Formatter itself

$iNumeric

If on, the value seems to be number

$iFmtIdx

Format index number of that cell

TextFmt($oSelf, $sText, $sCode)

converts original text into applicatable for Value.

$oSelf

Formatter itself

$sText

Original text

$sCode

Character code of Original text

ValFmt($oSelf, $oCell, $oBook)

converts original value into applicatable for Value.

$oSelf

Formatter itself

$oCell

Cell object

$oBook

Workbook object

FmtString($oSelf, $oCell, $oBook)

get format string for the $oCell.

$oSelf

Formatter itself

$oCell

Cell object

$oBook

WorkBook object contains that cell

KNOWN PROBLEM

This module can not get the values of fomulas in Excel files made with Spreadsheet::WriteExcel. Normaly (ie. By Excel application), formula has the result with it. But Spreadsheet::WriteExcel writes formula with no result. If you set your Excel application "Auto Calculation" off. (maybe [Tool]-[Option]-[Calculation] or something) You will see the same result.

AUTHOR

Kawai Takanori (Hippo2000) kwitknr@cpan.org

    http://member.nifty.ne.jp/hippo2000/            (Japanese)
    http://member.nifty.ne.jp/hippo2000/index_e.htm (English)

SEE ALSO

XLHTML, OLE::Storage, Spreadsheet::WriteExcel, OLE::Storage_Lite

This module is based on herbert within OLE::Storage and XLHTML.

TODO

- Spreadsheet::ParseExcel : Password protected data, Formulas support, HyperLink support, Named Range support

- Spreadsheet::ParseExcel::SaveParser : Catch up Spreadsheet::WriteExce feature, Create new Excel fle

COPYRIGHT

Copyright (c) 2000-2002 Kawai Takanori All rights reserved.

You may distribute under the terms of either the GNU General Public License or the Artistic License, as specified in the Perl README file.

ACKNOWLEDGEMENTS

First of all, I would like to acknowledge valuable program and modules : XHTML, OLE::Storage and Spreadsheet::WriteExcel.

In no particular order: Yamaji Haruna, Simamoto Takesi, Noguchi Harumi, Ikezawa Kazuhiro, Suwazono Shugo, Hirofumi Morisada, Michael Edwards, Kim Namusk and many many people + Kawai Mikako.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 1031:

Unknown directive: =cmmt