The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

ODF::lpOD - Languages & platform for OpenDocument

SYNOPSIS

        use ODF::lpOD:
        
        my $document = odf_get_document("report.odt");

        my $meta = $document->get_part(META);
        $meta->set_title("The best document format");
        
        my $content = $document->get_part(CONTENT);
        my $context = $content->get_body;
        my $paragraph = $context->get_paragraph(content => "I look for it");
        $paragraph->set_text("I found it");
        $paragraph->set_style("Standout");
        my $new_paragraph = odf_create_paragraph (
                                style => "Standard",
                                text => "A new content"
                                );
        $context->append_element($new_paragraph);
        my $table = odf_create_table (
                "Main Figures", height => 20, width => 16
                );
        $context->insert_element($table, before => $paragraph);
        my $cell = $table->get_cell("B4");
        $cell->set_text("Here B4");

        $document->save;
        exit;
        

The code example shows above loads a document from an existing "report.odt" file, updates various data in the document, then saves the changes. The following actions are done in the document:

1) The title is set to "The best document format";

2) The first paragraph containing "I look for it" is retrieved (this paragraph is supposed to exist; otherwise get_paragraph would return undef);

3) The content of the found paragraph is replaced by "I found it", and its style is set to "Standout" (this style is supposed to exist or to be defined later);

4) A new paragraph, whose text is "A new content" and style is "Standard", is created then appended to the document body;

5) A new table whose name is "Main Figures" and size is 20x16 is created then inserted just before the first retrieved paragraph;

6) The "B4" cell (i.e. the cell belonging to the 4th row and the 2nd column, whatever the document type) is retrieved, and its content is set to "Here B4" (the cell data type is automatically set to 'string').

DESCRIPTION

This is the Perl implementation of the lpOD project.

lpOD is a Free Software project that offers, for high level use cases, an application programming interface dedicated to document processing with the Python, Perl and Ruby languages. It's complying with the OASIS Open Document Format (ODF), i.e. the ISO/IEC 26300 international standard.

lpOD is designed according to a top-down approach. The API is bound to the document functional structure and the user's point of view. As a consequence, it may be used without full knowledge of the ODF specification, and allows the application developer to be focused on the business needs instead of the low level storage concerns.

The lpOD API is object oriented.

The present distribution is a early developer version. It implements only a small part of the lpOD functional specification. However it's reasonably usable. The documented features will be extended and improved, but should not be removed in the stable version.

Basic document access principles

The general access to the documents uses the odf_document class. Before processing a document, an odf_document instance must be created using one of the allowed constructors. While an odf_document object encapsulates the physical resource access logic, the real data must be handled through document parts, knowing that each part represents a specialized aspect of the document.

Each part contains a set of odf_element objects, knowing that odf_element is the common base class for any kind of document simple or complex element (an odf_element may be a visible object, such as a paragraph or a table, as well as a piece of data that specifies the layout or the behaviour of other objects, such as a text style or a page layout). Each part contains a root element, that is a special odf_element containing all the elements of the part. A part may contain a body element, that is a more restricted but in some cases more interesting context than the root.

lpOD is a read-write API. However, the changes made by the applications aren't automatically persistent. The API provides methods that insert, delete, or update elements in memory. In order to make the change persistent, explicit odf_part and odf_document methods must be used.

Global document initialization

A few specialized constructors may be used in order to create odf_document objects. All these constructors return an odf_document object in case of success, a FALSE value otherwise.

One an odf_document is created, it's content may be wrote back to a persistent storage using its save method.

odf_get_document(source)

Instantiates an odf_document object which is a read-write interface to an existing ODF package corresponding to the given source. The package should be an ODF-compliant zip file (odt, ods, odp, and so on). Example:

        my $document = odf_get_document("C:\Path\Doc.odt");

In the presente version, the source argument must be provided either as a regular file path or as a IO::Handle object.

odf_new_document(document_type)

Returns a new odf_document corresponding to the given ODF document type. Allowed document types are presently 'text', 'spreadsheet', 'presentation', and 'drawing'). Example:

        my $document = odf_new_document('spreadsheet');

Technically, the new document is generated as a clone of an existing template document, provided with the lpOD distribution. It operates in the same way as odf_new_document_from_template, but the user doesn't need to provide the template document.

odf_new_document_from_template(source)

Returns a new odf_document instantiated from an existing ODF template package. Same as odf_get_document, but the source package is read-only.

save([destination])

This function is a method. It must be called from an odf_document instance.

Without argument, it attempts to write it's content back to the resource that was used to create it. A warning is issued and nothing is done if the document has been created without source file or from a read-only template (i.e. through odf_new_document or odf_new_document_from_template).

This method produces a file whose basic format is the same as the format of the source document or template (whatever the target filename, if any).

If the optional parameter target is provided, it's regarded as the storage destination. Its value may be a regular file path or a IO::Handle. This parameter is mandatory if the odf_document instance has been created through odf_new_document_from_template or odf_new_document_from_type.

Example:

        $document->save(target => "/myfiles/target.odt");

Document part initalization and handling

A regular ODF document contains various parts, some of them mandatory. The interesting parts in the lpOD scope are 'content', 'styles', 'meta', 'settings', and 'manifest'.

The odf_document class provides a get_part() method, that must be used with an argument that specifies the needed part. Example:

        my $content = $document->get_part(CONTENT);
        my $meta = $document->get_part(META);

The sequence above gives access to the content and meta parts of a previously created odf_document instance.

Beware: if get_part() is called twice or more from the same odf_document instance and with the same part designation, it returns the same object. As a consequence, after the sequence below, $p1 and $p2 will be synonyms:

        my $p1 = $document->get_part(CONTENT);
        my $p2 = $document->get_part(CONTENT);

serialize() returns an XML export of the whole part (the application is then responsible of the fate of this export). An optional pretty argument, if set to TRUE, specifies that the XML output must be human-readable. Example:

        my $content = $document->get_part(CONTENT);
        # here some content processing
        my $xml = $content->serialize(pretty => TRUE);

Basic ODF element handling

Every odf_part objects provides a low level get_element method whose first argument is an XPath expression and the second one a numeric position. The numeric argument specifies the order number of the required element among the set of elements matching the XPath. If the order number is negative, the position is regarded as counted backward from the end. The position is zero- based (i.e. a zero value means the first matching element). As an example, the instruction below returns the last paragraph of the document.

        my $document = odf_get_document($source);
        my $content = $document->get_part(CONTENT);
        my $p = $part->get_element("//text:p", -1);

However, this way is not the smartest one because it requires the knowledge of the ODF schema (and some XPath skills for more complicated cases).

lpOD provides more user-friendly, XPath-free methods for the most used elements in the CONTENT part of a document. These methods are provided through the odf_element class. Any individual element in a part is an odf_element object. There is a shortcut to get the top (or root) element of any part: the get_root() method. Once selected, the top element provides all the context methods of the lpOD API.

A context method is a method owned by an element (the context) and whose effect is related to the children and descendants of this element. So, the get_xxx method of a given element is a retrieval method intended to select something below the current element. Thanks to the get_paragraph element provided by the odf_element class, the last example could be wrote as shown below:

        my $document = odf_get_document($source);
        my $context = $document->get_part(CONTENT)->get_root;
        my $p = $context->get_paragraph(-1);

In most cases (including the previous example), get_root may be replaced by get_body, that return a context containing all the visible elements (including the paragraphs).

There is a generic context-based get_element that differs from the part-based one. It allows the user to select an element according to its text content, one of its attributes, and/or its sequential position in the context. As an example, the sequence below displays the name of the last page that uses the draw page style "dp1" (assuming we are using a presentation or drawing document):

        my $context = $document->get_part(CONTENT)->get_body;
        my $page = $context->get_element(
                'draw:page',
                attribute       => 'style name',
                value           => 'dp1',
                position        => -1
        );
        say $page->get_attribute('name');

lpOD provide special name-based retrieval methods for some elements that own unique names. For example the instruction below selects the table whose name is "T1" (if any):

        $table = $context->get_table_by_name("T1");

The meta document part, unlike others such as the content one, provides direct get and set accessors for the content of the usual metadata, so there is no need of a context element, as shown below in the following example that displays the title of a document:

        my $document = odf_get_document($source);
        my $meta = $document->get_part(META);
        say $meta->get_title;

The title (like an other metadata value) may be updated or created with the corresponding set accessor:

        $meta->set_title("The new title");

All the properties of a previously selected element are stored in one or more attributes and in a text. So, for any odf_element lpOD provides corresponding get and set accessors.

get_text returns the current text, while set_text replaces the current content by a new text (possibly empty). Without argument, get_text returns the text directly contained in the calling element, but with a recursive optional named parameter set to TRUE, it returns the concatenated texts of all the descendants of the calling element. On the other hand, set_text deletes any previous content (i.e. direct text content and embedded elements such as bookmarks, variable fields, text segments with special styles, and so on).

The get_attribute method requires the name of the needs attribute. This name may be the technical name according to the OpenDocument specification, or a more simple and significant name. For example, assuming $item is a list item, and knowing that such an object may own a so-called text:restart-numbering attribute telling that the list numbering must be restarted at this point from a given value, the following instruction sets this value to 6:

        $item->set_attribute('restart numbering' => 6);

set_attribute deletes an existing attribute as soon as the given value is undef; so the instruction below cancels the restart numbering feature:

        $item->set_attribute('restart numbering' => undef);

Note that set_attribute, provided with a non-null value, automatically creates the attribute if it doesn't exist; there is no need to separately check an attribute for existence and create it before setting a value.

It's possible to get or set more than one attributes in a single call using get_attributes or set_attributes. The first one returns the attributes as a hash reference (with the real ODF names), while the second one requires a hash reference as argument.

An element may be removed (with all its descendants) using its delete method. (Beware: the deletion of a high level element may destroy a lot of content !). It's possible to delete the whole content of an element without removing the element itself by issuing a set_text with an empty string.

The user is allowed to create a new element using the odf_create_element constructor, that requires an appropriate ODF tag (corresponding to the type of element) or a valid XML string. Fortunately, lpOD provides a set of specialized constructors (such as odf_create_paragraph, odf_create_table, and so on) that may be used without knowledge of the XML stuff. Once created through such a constructor, the new element is not automatically included in a document. To do so, lpOD provides the insert_element and append_element methods, both context-based, i.e. called from an existing element that will become the parent of the new element. As an example, the sequence below creates a new paragraph (with given style and content), then appends it to a selected section:

        my $document = odf_get_document($source);
        my $context = $document->get_part(CONTENT)->get_body;
        my $section = $context->get_section("Prologue");
        my $paragraph = odf_create_paragraph(
                style => "Standard", text => "The End of the Beginning"
                );
        $section->append_element($paragraph);

Elements may be created by replication of existing elements, thanks to the clone method. The result of the instruction below is a copy of an existing section (with all its content); this copy is a "free" element (i.e. it's not included in any document, and it has no link with its prototype element), so it may be inserted elsewhere in the same document or in another document:

        my $section = $context->get_section("Reusable");
        my $free_section = $section->clone;

Getting started

The "Hello Word" example

Unsuprisingly, we propose you to test your lpOD installation and your knowledge of the big picture through this simple program:

        use ODF::lpOD;
        
        my $doc = odf_new_document('text');
        my $content = $doc->get_part(CONTENT);
        my $context = $content->get_body;
        $context->append_element(
                odf_create_paragraph(
                        style => "Standard",
                        text => "Hello World !"
                        )
                );
        $content->store;
        $doc->save(target => "helloworld.odt");
        exit;

If this script runs without warning, open the "helloworld.odt" file using your favourite ODF-compliant text processor, and look at the text content. You may then introduce more sophistication using the metadata part of the document. To do so, you can (for example) insert the lines below somewhere before the save instruction (and after the odf_new_document one).

        my $meta = $doc->get_part(META);
        $meta->set_title("Hello World Test");
        $meta->set_creator("Me");
        $meta->set_creation_date(iso_date);
        $meta->set_modification_date(iso_date);
        $meta->store;

After execution of the extended version, check the author's name and the creation & modification dates through the File/Properties dialog of your text editor.

Using the documentation

The detailed documentation is split into several manual chapters which are not necessarily linked to Perl packages.

So the presently implemented features (or most of them) are described in the following manual pages For details about the main lpOD classes, look at the following manual pages, that cover a large part of the presently implemented features (that is a small part of the full lpOD scope):

You may have a look at http://docs.lpod-project.org in order to know the scope of the lpOD project and get an idea about the additional features that will be implemented soon.

COPYRIGHT & LICENSE

Copyright (c) 2010 Ars Aperta, Itaapy, Pierlis, Talend.

This work was sponsored by the Agence Nationale de la Recherche (http://www.agence-nationale-recherche.fr).

lpOD is free software; you can redistribute it and/or modify it under the terms of either:

a) the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. lpOD is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with lpOD. If not, see http://www.gnu.org/licenses/.

b) the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0