The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

OpenOffice::Parse::SXC - Perl extension for parsing OpenOffice SXC files

SYNOPSIS

  use OpenOffice::Parse::SXC qw( parse_sxc );

  # Non-OO way:

  my @rows      = parse_sxc( "file.sxc" );
  for( @rows ) {
    print join(",", $_ ),"\n";
  }

  # OO way:

  package MyDataHandler;        # Set up a handler object
  sub new {
    my $type            = shift;
    my $self            = {};
    bless $self, $type;
    return $self;
  }
  sub row {
    my $self            = shift;
    my $SXC             = shift;
    my $row_data        = shift;
    print $self->{worksheet},": ",join(",", $_),"\n";   # Simple csv values printed...
  }
  sub worksheet {
    my $self            = shift;
    my $SXC             = shift;
    my $worksheet       = shift;
    $self->{worksheet}  = $worksheet;
  }
  sub workbook {
    my $self            = shift;
    my $SXC             = shift;
    my $workbook        = shift || "unknown_workbook";
  }
  1;

  package Main;

  my $SXC       = OpenOffice::Parse::SXC->new( OPTIONS );
  $SXC->set_data_handler( MyDataHandler->new );
  $SXC->parse_file( "file.sxc" );

DESCRIPTION

OpenOffice::Parse::SXC parses an SXC file (OpenOffice spreadsheet) and passes data back through a callback object that you register with the SXC object.

The major benefit of being able to read directly from an OpenOffice spreadsheet is that it allows SXC files to be directly used as a development tool.

The data returned contains no formatting or formula information, only what text is displayed in the spreadsheet.

This module requires XML::Parser and the compression utility unzip to be installed.

DATA CONVERSIONS

The data that this module will provide you with is exactly the same as what you would see in the OpenOffice application. This could be different than what you entered. For example, this module would provide the results of a function, not the function itself. If you enter 19.95 into a cell, and format that cell as a currency type, you would see $19.95 (for example), and that is what you would get using this module to parse the spreadsheet.

EXPORT

None by default.

EXPORT_OK

parse_sxc SXC_FILENAME:

Parses an SXC file returning a list of lists containing the cell data.

csv_quote STRING:

Quotes a string in "CSV format". The transformation converts each double-quote to two double-quotes, then double-quoting the entire string. All newlines are removed!

dump_sxc_file SXC_FILENAME:

Prints out a Dumper'ed version of the entire SXC XML tree. Used for debugging.

PUBLIC METHODS

new OPTIONS

Create a new SXC object.

parse FILEHANDLE

Parse file FILENAME. This method calls parse_file().

parse_file SXC_FILENAME

Parse the data in filehandle SXC_FILEHANDLE.

get_current_worksheet_name

Returns the name of the current worksheet. This is only useful to the DATA HANDLER object (ie: during processing)

get_option OPTION_NAME

Gets an option.

set_options OPTION_NAME => VALUE, ...

Set one or more options

set_data_handler

Sets the DATA HANDLER. See the synopsis, and the DATA HANDLER section for details.

get_data_handler

Gets the DATA HANDLER.

OPTIONS

The following options can be used (in new() or set_options()):

worksheets => [ LIST_OF_WORKSHEETS_TO_PROCESS ]

An SXC 'workbook' consists of multiple 'worksheets', (internally refered to as tables) You can specify which worksheets you would like to process, or ALL of them if this option is not used.

no_trim => 1

If NOT specified, the trailing empty cells in each row will be spliced out.

DATA HANDLER

The DATA HANDLER is what the SXC object calls upon do do work while it parses an SXC file. It expects the DATA HANDLER object to implement the following methods:

row:

Handle row data

worksheet:

Called each time a new worksheet is encountered. Note: there is no callback for when a worksheet ends.

workbook:

Called each time a new workbook is encountered. (This helps when the same SXC object is used to process multiple files. As with worksheet(), there is no callback for the end of a workbook.

Each method gets the SXC object as the first argument, and the data as the second argument: worksheet gets the name of the worksheet, workbook gets the filename of the SXC file, and row receives a list reference to all the cells in that row.

The interesting callback is the row() function, and often it's the only function of any interest. If you want to avoid creating a class and just want to implement a row() callback, you can do something like this:

  sub Whatever::row {
    my($self, $SXC, $row_data) = @_;
    print join(",", map { csv_quote( $_ ) } @$row_data ),"\n";
  }
  sub Whatever::worksheet {}
  sub Whatever::workbook {}
  $SXC->set_data_handler( bless {}, "Whatever" );
  $SXC->parse_file( ... );

AUTHOR

Desmond Lee <deslee@shaw.ca>

SEE ALSO

sxc2csv.

8 POD Errors

The following errors were encountered while parsing the POD:

Around line 531:

'=item' outside of any '=over'

Around line 547:

You forgot a '=back' before '=head1'

Around line 549:

'=item' outside of any '=over'

Around line 583:

You forgot a '=back' before '=head1'

Around line 587:

'=item' outside of any '=over'

Around line 600:

You forgot a '=back' before '=head1'

Around line 606:

'=item' outside of any '=over'

Around line 641:

You forgot a '=back' before '=head1'