The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Treex::PML - Perl implementation for the Prague Markup Language (PML).

SYNOPSIS

  use Treex::PML;

  my $file="trees.pml";
  my $document = Treex::PML::Factory->createDocumentFromFile($file);
  foreach my $tree ($document->trees) {
     my $node = $tree;
     while ($node) {
       ...  # do something on node
       $node = $node->following; # depth-first traversal
     }
  }
  $document->save();

INTRODUCTION

This package provides API for manipulating linguistically annotated treebanks. The module implements a generic data-model of a XML-based format called PML (http://ufal.mff.cuni.cz/jazz/PML/) and features pluggable I/O backends and on-the-fly XSLT transformation to support other data formats.

About PML

Prague Marup Language (PML) is an XML-based, universally applicable data format based on abstract data types intended primarily for interchange of linguistic annotations. It is completely independent of a particular annotation schema. It can capture simple linear annotations as well as annotations with one or more richly structured interconnected annotation layers, dependency or constituency trees. A concrete PML-based format for a specific annotation is defined by describing the data layout and XML vocabulary in a special file called PML Schema and referring to this schema file from individual data files (instances). The schema can be used to validate the instances. It is also used by applications to ``understand'' the structure of the data and to choose optimal in-memory representation. The generic nature of PML makes it very easy to convert data from other formats to PML without loss of information.

History

PML and was developed at the Institute of Formal and Applied Linguistics of the Charles University in Prague. It was first used in the Prague Dependency Treebank 2.0 and several other treebanks since. Conversion tools for various existing treebank formats are available, too.

This library was originally developed for the TrEd framework (http://ufal.mff.cuni.cz/~pajas/tred) and evolved gradually from an older library called Fslib, implementing an older data format called FS format http://ufal.mff.cuni.cz/pdt2.0/doc/data-formats/fs/index.html (this format is still fully supported by the current implementation).

DESCRIPTION

Treex::PML provides among other the following classes:

Treex::PML::Factory

a factory class which delegates object creation to a default factory class, which can be specified by the user (defaults to Treex::PML::StandardFactory). It is important that both user and library code uses the create methods from Treex::PML::Factory to create new objects rather than calling constructors from an explicit object class.

This classical Factory Pattern allows the user to replace the standard family of Treex::PML classes with customized versions by setting up a customized factory as default. Then, all objects created by the Treex::PML library and applications will be from the customized family.

Treex::PML::StandardFactory

the standard factory class.

Treex::PML::Document

representing a PML document consisting of a set of trees.

Treex::PML::Node

representing a node of a tree (including the root node, which also represents the whole tree), see "Representation of trees" in Treex::PML::Node for details.

Treex::PML::Schema

representing a PML schema.

Treex::PML::Instance

implementing a PML instance.

Treex::PML::List

implementing a PML list.

Treex::PML::Alt

implementing a PML alternative.

Treex::PML::Seq

implementing a PML sequence.

Treex::PML::Container

implementing a PML container.

Treex::PML::Struct

implementing a PML attribute-value structure.

Treex::PML::FSFormat

representing an old-style document format for documents in the FS format.

Resource paths

Since some I/O backends require additional resources (such as schemas, DTDs, configuration files, XSLT stylesheets, dictionaries, etc.), For this purpose, Treex::PML maintains a list of so called "resource paths" which I/O backends may conveniently search for their resources.

See "PACKAGE FUNCTIONS" for description of functions related to pluggable I/O backends and the list resource paths..

PACKAGE FUNCTIONS

Treex::PML::does ($thing,$role)
Parameters

$thing - any Perl scalar (an object, a reference or a non-reference)

Description

This function is an alias for a very useful function UNIVERSAL::DOES::does(), which does checks if $thing performs the inteface (role) $role. If the thing is an object or class, it simply checks $thing->DOES($role) (see UNIVERSAL::DOES or UNIVERSAL in Perl >= 5.10.1). Otherwise it tells whether the thing can be dereferenced as an array/hash/etc.

Unlike UNIVERSAL::isa(), it is semantically correct to use does for something unknown and to use it for reftype.

This function also handles overloading. For example, does($thing, 'ARRAY') returns true if the thing is an array reference, or if the thing is an object with overloaded @{}.

Using this function (or UNIVERSAL::DOES::does()) is the recommended method for testing types of objects in the Treex::PML hierarchy (Treex::PML::Node, Treex::PML::Document, etc.)

Returns

In a list context the list of backends sucessfully loaded, in scalar context a true value if and only if all requested backends were successfully loaded.

Treex::PML::UseBackends (@backends)
Parameters

@backends - a list of backend names

Description

Demand loading and using the given modules as the initial set of I/O backends. The initial set of backends is returned by Backends(). This set is used as the default set of backends by Treex::PML::Document->load (unless a different list of backends was specified in a parameter).

Returns

In a list context the list of backends sucessfully loaded, in scalar context a true value if and only if all requested backends were successfully loaded.

Treex::PML::AddBackends (@backends)
Parameters

@backends - a list of backend names

Description

In a list context the list of already available backends sucessfully loaded, in scalar context a true value if and only if all requested backends were already available or successfully loaded.

Returns

A list of backends already available or sucessfully loaded.

Treex::PML::Backends ()
Description

Returns the initial set of backends. This set is used as the default set of backends by Treex::PML::Document->load.

Returns

A list of backends already available or sucessfully loaded.

Treex::PML::BackendCanRead ($backend)
Parameters

$backend - a name of an I/O backend

Returns

Returns true if the backend provides all methods required for reading.

Treex::PML::BackendCanWrite ($backend)
Parameters

$backend - a name of an I/O backend

Returns

Returns true if the backend provides all methods required for writing.

Treex::PML::ImportBackends (@backends)
Parameters

@backends - a list of backend names

Description

Demand to load the given modules as I/O backends and return a list of backend names successfully loaded. This list may then passed to Treex::PML::Document IO calls.

Returns

List of names of successfully loaded I/O backends.

Treex::PML::CloneValue ($scalar,$old_values?, $new_values?)
Parameters

$scalar - arbitrary Perl scalar $old_values - array reference (optional) $new_values - array reference (optional)

Description

Returns a deep copy of the Perl structures contained in a given scalar.

The optional argument $old_values can be an array reference consisting of values (references) that are either to be preserved (if $new_values is undefined) or mapped to the corresponding values in the array $new_values. This means that if $scalar contains (possibly deeply nested) reference to an object $A, and $old_values is [$A], then if $new_values is undefined, the resulting copy of $scalar will also refer to the object $A rather than to a deep copy of $A; if $new_values is [$B], all references to $A will be replaced by $B in the resulting copy. Note also that the effect of using [$A] as both $old_values and $new_values is the same as leaving $new_values undefined.

Returns

a deep copy of $scalar as described above

Treex::PML::ResourcePaths ()

Returns the current list of directories used by Treex::PML to search for resources.

Treex::PML::SetResourcePaths (@paths)
Parameters

@paths - a list of a directory paths

Description

Specify the complete set of directories to be used by Treex::PML when looking up resources.

Treex::PML::AddResourcePath (@paths)
Parameters

@paths - a list of directory paths

Description

Add given paths to the end of the list of directories searched by Treex::PML for resources.

Treex::PML::AddResourcePathAsFirst (@paths)
Parameters

@paths - a list of directory paths

Description

Add given paths to beginning of the list of directories searched for resources.

Treex::PML::RemoveResourcePath (@paths)
Parameters

@paths - a list of directory paths

Description

Remove given paths from the list of directories searched for resources.

Treex::PML::FindInResourcePaths ($filename, \%options?)
Parameters

$filename - a relative path to a file

Description

If a given filename is a relative forward path (e.g. containing no up-dir '..' directory parts) of a file found in the resource paths, return:

If the option 'all' is true, a list of absolute paths to all occurrences found (may be empty).

If the option 'strict' is true, an absolute path to the first occurrence or an empty list if there is no occurrence of the file in the resource paths.

Otherwise act as with 'strict', but return unmodified $filename if no occurrence is found.

If $filename is an absolute path, it is always returned unmodified as a single return value.

Options are passed in an optional second argument as key-value pairs of a HASH reference:

  FindInResources($filename, {
    # 'strict' => 0 or 1
    # 'all'    => 0 or 1
  });
Treex::PML::FindInResources ($filename)

Alias for FindInResourcePaths($filename).

Treex::PML::FindDirInResourcePaths ($dirname)
Parameters

$dirname - a relative path to a directory

Description

If a given directory name is a relative path of a sub-directory located in one of resource directories, return an absolute path for that subdirectory. Otherwise return dirname.

Treex::PML::FindDirInResources ($filename)

Alias for FindDirInResourcePaths($filename).

Treex::PML::ResolvePath ($ref_filename,$filename,$search_resource_path?)
Parameters

$ref_path - a reference filename

$filename - a relative path to a file

$search_resource_paths - 0 or 1

Description

If the $filename is an absolute path or an absolute URL, it is returned umodified. If it is a relative path and $ref_path is a local path or a file:// URL, the function tries to locate the file relatively to $ref_path and if such a file exists, returns an absolute filename or file:// URL to the file. Otherwise, returns the value of FindInResourcePaths($filename) if the $search_resource_paths argument was true or absolute path or URL resolved relatively to ref_path otherwise.

The rationale behind this function is as follows: paths that are relative to remote resources are to be preferably located in ResourcePaths; paths that are relative to a local resource are preferably located in the actual location and then in ResourcePaths.

EXPORTED SYMBOLS

For backward compatibility reasons only, Treex::PML exports by default the following function symbol:

ImportBackends

For this reason, it is recommended to load Treex::PML as:

  use Treex::PML ();

The following function symbols can be imported on demand:

ImportBackends, CloneValue, ResourcePaths, FindInResources, FindDirInResources, FindDirInResourcePaths, ResolvePath, AddResourcePath, AddResourcePathAsFirst, SetResourcePaths, RemoveResourcePath

SEE ALSO

Tree editor TrEd: http://ufal.mff.cuni.cz/~pajas/tred

Prague Markup Language (PML) format: http://ufal.mff.cuni.cz/jazz/PML/

Description of FS format: http://ufal.mff.cuni.cz/pdt/Corpora/PDT_1.0/Doc/fs.html

Related packages: Treex::PML::Schema, Treex::PML::Instance, Treex::PML::Document, Treex::PML::Node, Treex::PML::Factory

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Treex::PML

You can also look for information at:

COPYRIGHT AND LICENSE

Copyright (C) 2006-2010 by Petr Pajas

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.2 or, at your option, any later version of Perl 5 you may have available.