Lee ♫ Goddard

NAME

DATR2XML.pm - manipulate DATR .dtr, XML, HTML, XML

SYNOPSIS

        #! perl -w
        use DATR2XML;

        undef $DATR2XML::includeNodePath;
        $datr -> set_stylesheet('D:/DATR/XSLT/datr.xsl');

        $datr_eg1 = new DATR2XML('D:\DATR\perl\eg.dtr');
        $datr_eg2 = new DATR2XML('D:/DATR/perl/eg.dtr', "on");
        $datr_eg3 = new DATR2XML('http://somewhere/doc.dtr', "verbose");

        viewAll $datr_eg1;
        $datr_eg2 -> viewHeader;

        $datr_eg3 -> printHeader;
        printOpening $datr_eg3;
        printNodes $datr_eg3;
        printClosing $datr_eg3;

        printAll $datr_eg3;

        save $datr_eg3;

        DATR2XML::convert('D:\DATR\XSLT\eg_opening.dtr');

DESCRIPTION

This module parses into a Perl struct a DATR .dtr-formatted file, as defined in Gerald Gazdar's 'DATR By Example' published on the DATR web-pages at the University of Sussex < http://www.sussex.ac.uk/ >.

Particular respect was paid to datanode31.html, though I confess the formal definitions found elsewhere on the site made no sense to me.

LOGGING

Process logging may be set to "off", "on" or "true", and "verbose".

REQUIRED MODULES

If internet access is required, the following modules must be installed and on the @INC path:

        LWP::UserAgent
        HTTP::Request

If no internet access is required, these modules will not be called.

DIAGNOSTICS

The usual warnings if it can't read or write.

EXPORTS

The module exports nothing to the calling namespace.

CAVEATS

The module does not fully support The DATR Standard Library RFC, Version 2.20. Specifically, it does not support the use of the proposed path cut operator as a full-stop within a path: all full stops are taken to signify the end of a clause.

TO DO

        * Support The DATR Standard Library RFC, Version 2.20
        * Change mechanism of _parseOpeningClosing to allow
          line-spanning of contents.
        * Support interpoloation of directives within body
          as specified by the style sheet
        * Fully support comment printing as specified by DATR XML DTD.
          Currently lumps all comments together.



GLOBAL VARIABLES

These variables can adjust the output of the DTR parser: when they are undefined (using DATR2XML::$var = undef) they prevent the DTR parser from outputing any element which has a default value, as defined in the DATR DTD; when they are defined with any value, they force XML output in full.

$printComments

Set with any value to print comments, undef not to.

$includeNodePath

The DTD provides the default path as a null path, but this can adjusted by setting $includeSentenceType to 1. This can be reset by calling undef upon the variable. See also include_sentence_type.

$includeSentenceType

The DATR DTD provides the default type as ==, and this can be left if this variable is set, which is its defualt state. See also include_sentence_type.

$location_xsl

The path to the required XSLT stylesheet. The default is http://www.leegoddard.com/DATR/XSLT/datr.xsl. See also the method and procedure set_stylesheet.

$location_dtd

The SYSTEM location of (that is, the path to) the DATR DTD. The default is http://www.leegoddard.com/DATR/DTD/DATR1.0.dtd. See also the method and procedure set_dtd.

$datr_root

This is literally the root element as printed, and may contain a references, such as to XML schema.

        Eg:
        $datr_root = '<DATR xmlns="x-schema:http://www.leegoddard.com/DATR/DTD/DATR1.0.xml">';

The defualt is simply the opening of the DATR element. See also set_schema.

PUBLIC METHODS

Constructor (new)

Creates a new DATR2XML object from file, URI or DATR .dtr source.

Accepts: DATR source as scalar, array, scalar/array pointer, or path to a DATR file. If source is scalar or pointer to a scalar, is assumed to be just a list of node definitions, of BODY slot.

                Optionally accepts a second argument to set logging: see the manual entry
                for the logging method for details.

Returns: reference to object.

Object Structure: a hash with the following fields:

        LOCATION - the name of the file, if any

        HEADER   - the file header (as defined in datrnode44.html#fileheader)

        OPENING  - opening declarations/directives as defined in datrnode45.html#openingdeclarations

        BODY     - node defintions,itself an array of hashes of the format defined in _parseNodes

        CLOSING  - clsoing declarations/directives as defined in datrnode47.html#closingdeclarations

include_sentence_type

Sets or resets the type attribute of EQUATION elements.

Calling with an argument value of 1 includes the type attribute (default); calling with 0 forces the type attribute to be omitted.

Call without a value to stop comment printing; call with a value to restart comment printing. Default is to print comments.

set_stylesheet

Sets the path to the required XSLT stylesheet. See also location_xsl in the section Global Variables.

set_dtd

Sets the location of the DTD as used in the DOCTYPE SYSTEM declaration. See also location_dtd in the section Global Variables.

set_schema

Sets the location of the XML Schema as used in the root element. If called with no arguemnt value, removes all references to an XML Schema, setting $datr_root to the opening of the DATR root tag without attributes.

Calling with a value of 1 sets the Schema to the author's, located at http://www.leegoddard.com/DATR/DTD/DATR1.0.xml. See also datr_root in the section Global Variables.

logging

Turns logging off or on, verbose or minimal.

        Accepts:        "true|on|minimal" or "verbose" or "off|none|silent"
        Returns:        None

viewAll

Provides a rough printout of all records

        Accepts:        object ref;
        Returns:        none

viewHeader

Provides a rough printout of all nodes

        Accepts:        object ref;
        Returns:        none

viewOpening

Provides a rough view of the opening directives/definitions

        Accepts:        object ref;
        Returns:        none

viewClosing

Provides a rough view of the closing directives/definitions

        Accepts:        object ref;
        Returns:        none

viewNodes

Provides a rough printout of all nodes

        Accepts:        object ref;
        Returns:        none

save

Saves to local filesystem an XML printout of all records

        Accepts:        object ref;
                        optional file path to save at
                        or, for internal use, typeglob for PERL filehandle.
        Returns:        none
        Notes:          simply calls printAll, passing filehandle if necessary.

convert

Convert one or more DATR files to XML.

        Accepts:        I<Either>:
                        a filepath with an extension,
                        optionally with an additional destination filepath or directory,
                        I<or,>
                        for batch operation, a directory location.
        Returns:        nothing, will die on errors
        Notes:          Does not accept URLs and does not process sub-directories.
                        Minimizes logging during operation.

printAll

Provides an XML printout of all records

        Accepts:        object ref;
                        optional file path to save at.
                        or, for internal use, typeglob for PERL filehandle
        Returns:        none

printHeader

        Provides an rough printout of all nodes

        Accepts:        object ref;
                        optional file path
                        or, for internal use, typeglob for PERL filehandle
        Returns:        none

printOpening; printClosing

Provides an XML printout of the opening/closing directives/definitions block element. Without passing a filepath or typeglob for filehandle, outputs to STDOUT. Just a wrapper for _printOpeningClosing.

        Accepts:        object ref;
                        optionally a file path
                        or, for internal use, typeglob for PERL filehandle
        Returns:        none

printNodes

Provides an XML printout of all nodes. Basically writes the EQUATION element and calls _parsePath on each value of the object's {BODY} key.

        Accepts:        object ref
        Returns:        none

PRIVATE METHODS

All private method subroutine names are prefixed with an underscore.

_loadFile (private method)

Load a dtr file from the local file system.

        Accepts:        object reference
        Returns:        an array of file contents

_loadURI (private method)

Load a dtr document from a URI

        Accepts:        object reference
        Returns:        an array of file contents

_parseHeader (private method)

Parses a .dtr-format file header into the class record

        Accepts:        object ref;
        Returns:        none
        Struct:         This method fills the hash held in $self->{HEADER}
                        with whatever fields the C<.dtr> file header contains that match
                        a name/value pair delimited with a colon.

_parseOpening (private method)

Extracts opening directives, those occuring before node definitions, and places them into the self-object's OPENING array.

        Accepts:        object ref, ref to DATR data
        Returns:        none

_parseClosing (private method)

Extracts closing directives, those occuring before node definitions

        Accepts:        object ref; reference to array of DATR data
        Returns:        none
        Notes:          reverses @_ then applies same proc as _parseOpening, then reverses output

_parseNodes (private method)

Parse a list of nodes to the class BODY record.

        Accepts:        an obj ref and an reference to an array
                        of DATR data
        Returns:        none
        Struct:         This method creates the array of hashes held in $self->{BODY}
                        with the following fields:

                        NODE    - the name of the current node
                        PATH    - the (left-hand) path
                        TYPE    - the sentence-type signifier: = or ==
                        VALUE   - the (right-hand) value
                        COMMENT - an array of comments, index reflecting source line number

_parsePath (private pseudo-method)

Decodes path attributes into an XML structure.

        Accepts:        a string of DATR path (as in $$hash{VALUE});
                        optionally a second argument, being the name of a node to
                        build-out the sentence (cf. geraldg@cogs.susx.ac.uk, 06/07/00)
        Returns:        a string of XML structure
        Notes:          a bit of a hack, really.

_preFormatNodes (private method)

Formats nodes for processing by removing comments/directives/linefeeds

        Accepts:        strings or array of DATR node/path/value sentences
        Returns:        one string of DATR node/path/value sentences, without linebreaks

_setupOutput (private method)

Sets up a filehandle for output, whether STDOUT or not

        Accepts:        string of a filepath, or a filehandle, or a (ref to a) typeglob, or undef
        Returns:        a reference to a typeglob that is the filehandle
        See also:       "Passing Filehandles" in perlfaq7 Perl documentation
        Note:           Would it be better not to default to STDOUT but
                        to default to a filename specified at object construction time?

_printOpeningClosing (private pseudo-method)

Prints as XML contents of opening/clsoing, as requested.

AUTHOR and COPYRIGHT

Author: Lee Goddard code@leegoddard.com, leego@cogs.susx.ac.uk

Copyright: © Lee Goddard, 09/06/00 and as above. All Rights Reserved. License: The GNU General Public License applies: copies available from www.gnu.org/. You are free to distribute and modify this module under the same terms as those of Perl itself.

5 POD Errors

The following errors were encountered while parsing the POD:

Around line 104:

'=item' outside of any '=over'

Around line 178:

You forgot a '=back' before '=head1'

Around line 353:

=cut found outside a pod block. Skipping to next block.

Around line 386:

=cut found outside a pod block. Skipping to next block.

Around line 1452:

Non-ASCII character seen before =encoding in '©'. Assuming ISO8859-1