NAME
OpenOffice::Parse::SXC - Perl extension for parsing OpenOffice SXC files
SYNOPSIS
use OpenOffice::Parse::SXC qw( parse_sxc );
# Non-OO way:
my @rows = parse_sxc( "file.sxc" );
for( @rows ) {
print join(",", $_ ),"\n";
}
# OO way:
package MyDataHandler; # Set up a handler object
sub new {
my $type = shift;
my $self = {};
bless $self, $type;
return $self;
}
sub row {
my $self = shift;
my $SXC = shift;
my $row_data = shift;
print $self->{worksheet},": ",join(",", $_),"\n"; # Simple csv values printed...
}
sub worksheet {
my $self = shift;
my $SXC = shift;
my $worksheet = shift;
$self->{worksheet} = $worksheet;
}
sub workbook {
my $self = shift;
my $SXC = shift;
my $workbook = shift || "unknown_workbook";
}
1;
package Main;
my $SXC = OpenOffice::Parse::SXC->new( OPTIONS );
$SXC->set_data_handler( MyDataHandler->new );
$SXC->parse_file( "file.sxc" );
DESCRIPTION
OpenOffice::Parse::SXC parses an SXC file (OpenOffice spreadsheet) and passes data back through a callback object that you register with the SXC object.
The major benefit of being able to read directly from an OpenOffice spreadsheet is that it allows SXC files to be directly used as a development tool.
The data returned contains no formatting or formula information, only what text is displayed in the spreadsheet.
This module requires XML::Parser and the compression utility unzip to be installed.
DATA CONVERSIONS
The data that this module will provide you with is exactly the same as what you would see in the OpenOffice application. This could be different than what you entered. For example, this module would provide the results of a function, not the function itself. If you enter 19.95 into a cell, and format that cell as a currency type, you would see $19.95 (for example), and that is what you would get using this module to parse the spreadsheet.
EXPORT
None by default.
EXPORT_OK
- parse_sxc SXC_FILENAME:
-
Parses an SXC file returning a list of lists containing the cell data.
- csv_quote STRING:
-
Quotes a string in "CSV format". The transformation converts each double-quote to two double-quotes, then double-quoting the entire string. All newlines are removed!
- dump_sxc_file SXC_FILENAME:
-
Prints out a Dumper'ed version of the entire SXC XML tree. Used for debugging.
PUBLIC METHODS
- new OPTIONS
-
Create a new SXC object.
- parse FILEHANDLE
-
Parse file FILENAME. This method calls parse_file().
- parse_file SXC_FILENAME
-
Parse the data in filehandle SXC_FILEHANDLE.
- get_current_worksheet_name
-
Returns the name of the current worksheet. This is only useful to the DATA HANDLER object (ie: during processing)
- get_option OPTION_NAME
-
Gets an option.
- set_options OPTION_NAME => VALUE, ...
-
Set one or more options
- set_data_handler
-
Sets the DATA HANDLER. See the synopsis, and the DATA HANDLER section for details.
- get_data_handler
-
Gets the DATA HANDLER.
OPTIONS
The following options can be used (in new() or set_options()):
- worksheets => [ LIST_OF_WORKSHEETS_TO_PROCESS ]
-
An SXC 'workbook' consists of multiple 'worksheets', (internally refered to as tables) You can specify which worksheets you would like to process, or ALL of them if this option is not used.
- no_trim => 1
-
If NOT specified, the trailing empty cells in each row will be spliced out.
DATA HANDLER
The DATA HANDLER is what the SXC object calls upon do do work while it parses an SXC file. It expects the DATA HANDLER object to implement the following methods:
- row:
-
Handle row data
- worksheet:
-
Called each time a new worksheet is encountered. Note: there is no callback for when a worksheet ends.
- workbook:
-
Called each time a new workbook is encountered. (This helps when the same SXC object is used to process multiple files. As with worksheet(), there is no callback for the end of a workbook.
Each method gets the SXC object as the first argument, and the data as the second argument: worksheet gets the name of the worksheet, workbook gets the filename of the SXC file, and row receives a list reference to all the cells in that row.
The interesting callback is the row() function, and often it's the only function of any interest. If you want to avoid creating a class and just want to implement a row() callback, you can do something like this:
sub Whatever::row { my($self, $SXC, $row_data) = @_; print join(",", map { csv_quote( $_ ) } @$row_data ),"\n"; } sub Whatever::worksheet {} sub Whatever::workbook {} $SXC->set_data_handler( bless {}, "Whatever" ); $SXC->parse_file( ... );
AUTHOR
Desmond Lee <deslee@shaw.ca>
SEE ALSO
8 POD Errors
The following errors were encountered while parsing the POD:
- Around line 531:
'=item' outside of any '=over'
- Around line 547:
You forgot a '=back' before '=head1'
- Around line 549:
'=item' outside of any '=over'
- Around line 583:
You forgot a '=back' before '=head1'
- Around line 587:
'=item' outside of any '=over'
- Around line 600:
You forgot a '=back' before '=head1'
- Around line 606:
'=item' outside of any '=over'
- Around line 641:
You forgot a '=back' before '=head1'