The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

XML::DOM - A perl module for building DOM Level 1 compliant document structures

SYNOPSIS

 use XML::DOM;

 my $parser = new XML::DOM::Parser;
 my $doc = $parser->parsefile ("file.xml");

 # print all HREF attributes of all CODEBASE elements
 my $nodes = $doc->getElementsByTagName ("CODEBASE");
 my $n = $nodes->getLength;

 for (my $i = 0; $i < $n; $i++)
 {
     my $node = $nodes->item ($i);
     my $href = $node->getAttribute ("HREF");
     print $href->getValue . "\n";
 }

 $doc->printToFile ("out.xml");

 print $doc->toString;

DESCRIPTION

This module extends the XML::Parser module by Clark Cooper. The XML::Parser module is built on top of XML::Parser::Expat, which is a lower level interface to James Clark's expat library.

XML::DOM::Parser is derived from XML::Parser. It parses XML strings or files and builds a data structure that conforms to the API of the Document Object Model as described at http://www.w3.org/TR/REC-DOM-Level-1. See the XML::Parser manpage for other available features of the XML::DOM::Parser class. Note that the 'Style' property should not be used (it is set internally.)

The XML::Parser NoExpand option is more or less supported, in that it will generate EntityReference objects whenever an entity reference is encountered in character data. I'm not sure how useful this is. Any comments are welcome.

As described in the synopsis, when you create an XML::DOM::Parser object, the parse and parsefile methods create an XML::DOM::Document object from the specified input. This Document object can then be examined, modified and written back out to a file or converted to a string.

When using XML::DOM with XML::Parser version 2.19 and up, setting the XML::DOM::Parser option KeepCDATA to 1 will store CDATASections in CDATASection nodes, instead of converting them to Text nodes. Subsequent CDATASection nodes will be merged into one. Let me know if this is a problem.

A Document has a tree structure consisting of Node objects. A Node may contain other nodes, depending on its type. A Document may have Element, Text, Comment, and CDATASection nodes. Element nodes may have Attr, Element, Text, Comment, and CDATASection nodes. The other nodes may not have any child nodes.

This module adds several node types that are not part of the DOM spec (yet.) These are: ElementDecl (for <!ELEMENT ...> declarations), AttlistDecl (for <!ATTLIST ...> declarations), XMLDecl (for <?xml ...?> declarations) and AttDef (for attribute definitions in an AttlistDecl.)

DOM API

XML::DOM
Constant definitions

The following predefined constants indicate which type of node it is.

 UNKNOWN_NODE (0)                The node type is unknown (not part of DOM)

 ELEMENT_NODE (1)                The node is an Element.
 ATTRIBUTE_NODE (2)              The node is an Attr.
 TEXT_NODE (3)                   The node is a Text node.
 CDATA_SECTION_NODE (4)          The node is a CDATASection.
 ENTITY_REFERENCE_NODE (5)       The node is an EntityReference.
 ENTITY_NODE (6)                 The node is an Entity.
 PROCESSING_INSTRUCTION_NODE (7) The node is a ProcessingInstruction.
 COMMENT_NODE (8)                The node is a Comment.
 DOCUMENT_NODE (9)               The node is a Document.
 DOCUMENT_TYPE_NODE (10)         The node is a DocumentType.
 DOCUMENT_FRAGMENT_NODE (11)     The node is a DocumentFragment.
 NOTATION_NODE (12)              The node is a Notation.

 ELEMENT_DECL_NODE (13)          The node is an ElementDecl (not part of DOM)
 ATT_DEF_NODE (14)               The node is an AttDef (not part of DOM)
 XML_DECL_NODE (15)              The node is an XMLDecl (not part of DOM)
 ATTLIST_DECL_NODE (16)          The node is an AttlistDecl (not part of DOM)

 Usage:

   if ($node->getNodeType == ELEMENT_NODE)
   {
       print "It's an Element";
   }

Not In DOM Spec: The DOM Spec does not mention UNKNOWN_NODE and, quite frankly, you should never encounter it. The last 4 node types were added to support the 4 added node classes.

Global Variables

$VERSION

The variable $XML::DOM::VERSION contains the version number of this implementation, e.g. "1.07".

Additional methods not in the DOM Spec

getIgnoreReadOnly and ignoreReadOnly (readOnly)

The DOM Level 1 Spec does not allow you to edit certain sections of the document, e.g. the DocumentType, so by default this implementation throws DOMExceptions (i.e. NO_MODIFICATION_ALLOWED_ERR) when you try to edit a readonly node. These readonly checks can be disabled by (temporarily) setting the global IgnoreReadOnly flag.

The ignoreReadOnly method sets the global IgnoreReadOnly flag and returns its previous value. The getIgnoreReadOnly method simply returns its current value.

 my $oldIgnore = XML::DOM::ignoreReadOnly (1);
 eval {
 ... do whatever you want, catching any other exceptions ...
 };
 XML::DOM::ignoreReadOnly ($oldIgnore);     # restore previous value
isValidName (name)

Whether the specified name is a valid "Name" as specified in the XML spec. Characters with Unicode values > 127 are now also supported.

getAllowReservedNames and allowReservedNames (boolean)

The first method returns whether reserved names are allowed. The second takes a boolean argument and sets whether reserved names are allowed. The initial value is 1 (i.e. allow reserved names.)

The XML spec states that "Names" starting with (X|x)(M|m)(L|l) are reserved for future use. (Amusingly enough, the XML version of the XML spec (REC-xml-19980210.xml) breaks that very rule by defining an ENTITY with the name 'xmlpio'.) A "Name" in this context means the Name token as found in the BNF rules in the XML spec.

XML::DOM only checks for errors when you modify the DOM tree, not when the DOM tree is built by the XML::DOM::Parser.

setTagCompression (funcref)

There are 3 possible styles for printing empty Element tags:

Style 0
 <empty/> or <empty attr="val"/>

XML::DOM uses this style by default for all Elements.

Style 1
  <empty></empty> or <empty attr="val"></empty>
Style 2
  <empty /> or <empty attr="val" />

This style is sometimes desired when using XHTML. (Note the extra space before the slash "/") See http://www.w3.org/TR/WD-html-in-xml Appendix C for more details.

By default XML::DOM compresses all empty Element tags (style 0.) You can control which style is used for a particular Element by calling XML::DOM::setTagCompression with a reference to a function that takes 2 arguments. The first is the tag name of the Element, the second is the XML::DOM::Element that is being printed. The function should return 0, 1 or 2 to indicate which style should be used to print the empty tag. E.g.

 XML::DOM::setTagCompression (\&my_tag_compression);

 sub my_tag_compression
 {
    my ($tag, $elem) = @_;

    # Print empty br, hr and img tags like this: <br />
    return 2 if $tag =~ /^(br|hr|img)$/;

    # Print other empty tags like this: <empty></empty>
    return 1;
 }
XML::DOM::Node

Global Variables

@NodeNames

The variable @XML::DOM::Node::NodeNames maps the node type constants to strings. It is used by XML::DOM::Node::getNodeTypeName.

Methods

getNodeType

Return an integer indicating the node type. See XML::DOM constants.

getNodeName

Return a property or a hardcoded string, depending on the node type. Here are the corresponding functions or values:

 Attr                   getName
 AttDef                 getName
 AttlistDecl            getName
 CDATASection           "#cdata-section"
 Comment                "#comment"
 Document               "#document"
 DocumentType           getNodeName
 DocumentFragment       "#document-fragment"
 Element                getTagName
 ElementDecl            getName
 EntityReference        getEntityName
 Entity                 getNotationName
 Notation               getName
 ProcessingInstruction  getTarget
 Text                   "#text"
 XMLDecl                "#xml-declaration"

Not In DOM Spec: AttDef, AttlistDecl, ElementDecl and XMLDecl were added for completeness.

getNodeValue and setNodeValue (value)

Returns a string or undef, depending on the node type. This method is provided for completeness. In other languages it saves the programmer an upcast. The value is either available thru some other method defined in the subclass, or else undef is returned. Here are the corresponding methods: Attr::getValue, Text::getData, CDATASection::getData, Comment::getData, ProcessingInstruction::getData.

getParentNode and setParentNode (parentNode)

The parent of this node. All nodes, except Document, DocumentFragment, and Attr may have a parent. However, if a node has just been created and not yet added to the tree, or if it has been removed from the tree, this is undef.

getChildNodes

A NodeList that contains all children of this node. If there are no children, this is a NodeList containing no nodes. The content of the returned NodeList is "live" in the sense that, for instance, changes to the children of the node object that it was created from are immediately reflected in the nodes returned by the NodeList accessors; it is not a static snapshot of the content of the node. This is true for every NodeList, including the ones returned by the getElementsByTagName method.

NOTE: this implementation does not return a "live" NodeList for getElementsByTagName. See CAVEATS.

When this method is called in a list context, it returns a regular perl list containing the child nodes. Note that this list is not "live". E.g.

 @list = $node->getChildNodes;        # returns a perl list
 $nodelist = $node->getChildNodes;    # returns a NodeList (object reference)
 for my $kid ($node->getChildNodes)   # iterate over the children of $node
getFirstChild

The first child of this node. If there is no such node, this returns undef.

getLastChild

The last child of this node. If there is no such node, this returns undef.

getPreviousSibling

The node immediately preceding this node. If there is no such node, this returns undef.

getNextSibling

The node immediately following this node. If there is no such node, this returns undef.

getAttributes

A NamedNodeMap containing the attributes (Attr nodes) of this node (if it is an Element) or undef otherwise. Note that adding/removing attributes from the returned object, also adds/removes attributes from the Element node that the NamedNodeMap came from.

getOwnerDocument

The Document object associated with this node. This is also the Document object used to create new nodes. When this node is a Document this is undef.

insertBefore (newChild, refChild)

Inserts the node newChild before the existing child node refChild. If refChild is undef, insert newChild at the end of the list of children.

If newChild is a DocumentFragment object, all of its children are inserted, in the same order, before refChild. If the newChild is already in the tree, it is first removed.

Return Value: The node being inserted.

DOMExceptions:

  • HIERARCHY_REQUEST_ERR

    Raised if this node is of a type that does not allow children of the type of the newChild node, or if the node to insert is one of this node's ancestors.

  • WRONG_DOCUMENT_ERR

    Raised if newChild was created from a different document than the one that created this node.

  • NO_MODIFICATION_ALLOWED_ERR

    Raised if this node is readonly.

  • NOT_FOUND_ERR

    Raised if refChild is not a child of this node.

replaceChild (newChild, oldChild)

Replaces the child node oldChild with newChild in the list of children, and returns the oldChild node. If the newChild is already in the tree, it is first removed.

Return Value: The node replaced.

DOMExceptions:

  • HIERARCHY_REQUEST_ERR

    Raised if this node is of a type that does not allow children of the type of the newChild node, or it the node to put in is one of this node's ancestors.

  • WRONG_DOCUMENT_ERR

    Raised if newChild was created from a different document than the one that created this node.

  • NO_MODIFICATION_ALLOWED_ERR

    Raised if this node is readonly.

  • NOT_FOUND_ERR

    Raised if oldChild is not a child of this node.

removeChild (oldChild)

Removes the child node indicated by oldChild from the list of children, and returns it.

Return Value: The node removed.

DOMExceptions:

  • NO_MODIFICATION_ALLOWED_ERR

    Raised if this node is readonly.

  • NOT_FOUND_ERR

    Raised if oldChild is not a child of this node.

appendChild (newChild)

Adds the node newChild to the end of the list of children of this node. If the newChild is already in the tree, it is first removed. If it is a DocumentFragment object, the entire contents of the document fragment are moved into the child list of this node

Return Value: The node added.

DOMExceptions:

  • HIERARCHY_REQUEST_ERR

    Raised if this node is of a type that does not allow children of the type of the newChild node, or if the node to append is one of this node's ancestors.

  • WRONG_DOCUMENT_ERR

    Raised if newChild was created from a different document than the one that created this node.

  • NO_MODIFICATION_ALLOWED_ERR

    Raised if this node is readonly.

hasChildNodes

This is a convenience method to allow easy determination of whether a node has any children.

Return Value: 1 if the node has any children, 0 otherwise.

cloneNode (deep)

Returns a duplicate of this node, i.e., serves as a generic copy constructor for nodes. The duplicate node has no parent (parentNode returns undef.).

Cloning an Element copies all attributes and their values, including those generated by the XML processor to represent defaulted attributes, but this method does not copy any text it contains unless it is a deep clone, since the text is contained in a child Text node. Cloning any other type of node simply returns a copy of this node.

Parameters: deep If true, recursively clone the subtree under the specified node. If false, clone only the node itself (and its attributes, if it is an Element).

Return Value: The duplicate node.

normalize

Puts all Text nodes in the full depth of the sub-tree underneath this Element into a "normal" form where only markup (e.g., tags, comments, processing instructions, CDATA sections, and entity references) separates Text nodes, i.e., there are no adjacent Text nodes. This can be used to ensure that the DOM view of a document is the same as if it were saved and re-loaded, and is useful when operations (such as XPointer lookups) that depend on a particular document tree structure are to be used.

Not In DOM Spec: In the DOM Spec this method is defined in the Element and Document class interfaces only, but it doesn't hurt to have it here...

getElementsByTagName (name [, recurse])

Returns a NodeList of all descendant elements with a given tag name, in the order in which they would be encountered in a preorder traversal of the Element tree.

Parameters: name The name of the tag to match on. The special value "*" matches all tags. recurse Whether it should return only direct child nodes (0) or any descendant that matches the tag name (1). This argument is optional and defaults to 1. It is not part of the DOM spec.

Return Value: A list of matching Element nodes.

NOTE: this implementation does not return a "live" NodeList for getElementsByTagName. See CAVEATS.

When this method is called in a list context, it returns a regular perl list containing the result nodes. E.g.

 @list = $node->getElementsByTagName("tag");       # returns a perl list
 $nodelist = $node->getElementsByTagName("tag");   # returns a NodeList (object ref.)
 for my $elem ($node->getElementsByTagName("tag")) # iterate over the result nodes

Additional methods not in the DOM Spec

getNodeTypeName

Return the string describing the node type. E.g. returns "ELEMENT_NODE" if getNodeType returns ELEMENT_NODE. It uses @XML::DOM::Node::NodeNames.

toString

Returns the entire subtree as a string.

printToFile (filename)

Prints the entire subtree to the file with the specified filename.

Croaks: if the file could not be opened for writing.

printToFileHandle (handle)

Prints the entire subtree to the file handle. E.g. to print to STDOUT:

 $node->printToFileHandle (\*STDOUT);

Prints the entire subtree using the object's print method. E.g to print to a FileHandle object:

 $f = new FileHandle ("file.out", "w");
 $node->print ($f);
getChildIndex (child)

Returns the index of the child node in the list returned by getChildNodes.

Return Value: the index or -1 if the node is not found.

getChildAtIndex (index)

Returns the child node at the specifed index or undef.

addText (text)

Appends the specified string to the last child if it is a Text node, or else appends a new Text node (with the specified text.)

Return Value: the last child if it was a Text node or else the new Text node.

dispose

Removes all circular references in this node and its descendants so the objects can be claimed for garbage collection. The objects should not be used afterwards.

setOwnerDocument (doc)

Sets the ownerDocument property of this node and all its children (and attributes etc.) to the specified document. This allows the user to cut and paste document subtrees between different XML::DOM::Documents. The node should be removed from the original document first, before calling setOwnerDocument.

This method does nothing when called on a Document node.

isAncestor (parent)

Returns 1 if parent is an ancestor of this node or if it is this node itself.

expandEntityRefs (str)

Expands all the entity references in the string and returns the result. The entity references can be character references (e.g. "&#123;" or "&#x1fc2"), default entity references ("&quot;", "&gt;", "&lt;", "&apos;" and "&amp;") or entity references defined in Entity objects as part of the DocumentType of the owning Document. Character references are expanded into UTF-8. Parameter entity references (e.g. %ent;) are not expanded.

Interface XML::DOM::NodeList

The NodeList interface provides the abstraction of an ordered collection of nodes, without defining or constraining how this collection is implemented.

The items in the NodeList are accessible via an integral index, starting from 0.

Although the DOM spec states that all NodeLists are "live" in that they allways reflect changes to the DOM tree, the NodeList returned by getElementsByTagName is not live in this implementation. See CAVEATS for details.

item (index)

Returns the indexth item in the collection. If index is greater than or equal to the number of nodes in the list, this returns undef.

getLength

The number of nodes in the list. The range of valid child node indices is 0 to length-1 inclusive.

Additional methods not in the DOM Spec

dispose

Removes all circular references in this NodeList and its descendants so the objects can be claimed for garbage collection. The objects should not be used afterwards.

Interface XML::DOM::NamedNodeMap

Objects implementing the NamedNodeMap interface are used to represent collections of nodes that can be accessed by name. Note that NamedNodeMap does not inherit from NodeList; NamedNodeMaps are not maintained in any particular order. Objects contained in an object implementing NamedNodeMap may also be accessed by an ordinal index, but this is simply to allow convenient enumeration of the contents of a NamedNodeMap, and does not imply that the DOM specifies an order to these Nodes.

Note that in this implementation, the objects added to a NamedNodeMap are kept in order.

getNamedItem (name)

Retrieves a node specified by name.

Return Value: A Node (of any type) with the specified name, or undef if the specified name did not identify any node in the map.

setNamedItem (arg)

Adds a node using its nodeName attribute.

As the nodeName attribute is used to derive the name which the node must be stored under, multiple nodes of certain types (those that have a "special" string value) cannot be stored as the names would clash. This is seen as preferable to allowing nodes to be aliased.

Parameters: arg A node to store in a named node map.

The node will later be accessible using the value of the nodeName attribute of the node. If a node with that name is already present in the map, it is replaced by the new one.

Return Value: If the new Node replaces an existing node with the same name the previously existing Node is returned, otherwise undef is returned.

DOMExceptions:

  • WRONG_DOCUMENT_ERR

    Raised if arg was created from a different document than the one that created the NamedNodeMap.

  • NO_MODIFICATION_ALLOWED_ERR

    Raised if this NamedNodeMap is readonly.

  • INUSE_ATTRIBUTE_ERR

    Raised if arg is an Attr that is already an attribute of another Element object. The DOM user must explicitly clone Attr nodes to re-use them in other elements.

removeNamedItem (name)

Removes a node specified by name. If the removed node is an Attr with a default value it is immediately replaced.

Return Value: The node removed from the map or undef if no node with such a name exists.

DOMException:

  • NOT_FOUND_ERR

    Raised if there is no node named name in the map.

item (index)

Returns the indexth item in the map. If index is greater than or equal to the number of nodes in the map, this returns undef.

Return Value: The node at the indexth position in the NamedNodeMap, or undef if that is not a valid index.

getLength

Returns the number of nodes in the map. The range of valid child node indices is 0 to length-1 inclusive.

Additional methods not in the DOM Spec

getValues

Returns a NodeList with the nodes contained in the NamedNodeMap. The NodeList is "live", in that it reflects changes made to the NamedNodeMap.

When this method is called in a list context, it returns a regular perl list containing the values. Note that this list is not "live". E.g.

 @list = $map->getValues;        # returns a perl list
 $nodelist = $map->getValues;    # returns a NodeList (object ref.)
 for my $val ($map->getValues)   # iterate over the values
getChildIndex (node)

Returns the index of the node in the NodeList as returned by getValues, or -1 if the node is not in the NamedNodeMap.

dispose

Removes all circular references in this NamedNodeMap and its descendants so the objects can be claimed for garbage collection. The objects should not be used afterwards.

Interface XML::DOM::CharacterData extends XML::DOM::Node

The CharacterData interface extends Node with a set of attributes and methods for accessing character data in the DOM. For clarity this set is defined here rather than on each object that uses these attributes and methods. No DOM objects correspond directly to CharacterData, though Text, Comment and CDATASection do inherit the interface from it. All offsets in this interface start from 0.

getData and setData (data)

The character data of the node that implements this interface. The DOM implementation may not put arbitrary limits on the amount of data that may be stored in a CharacterData node. However, implementation limits may mean that the entirety of a node's data may not fit into a single DOMString. In such cases, the user may call substringData to retrieve the data in appropriately sized pieces.

getLength

The number of characters that are available through data and the substringData method below. This may have the value zero, i.e., CharacterData nodes may be empty.

substringData (offset, count)

Extracts a range of data from the node.

Parameters: offset Start offset of substring to extract. count The number of characters to extract.

Return Value: The specified substring. If the sum of offset and count exceeds the length, then all characters to the end of the data are returned.

appendData (str)

Appends the string to the end of the character data of the node. Upon success, data provides access to the concatenation of data and the DOMString specified.

insertData (offset, arg)

Inserts a string at the specified character offset.

Parameters: offset The character offset at which to insert. arg The DOMString to insert.

deleteData (offset, count)

Removes a range of characters from the node. Upon success, data and length reflect the change. If the sum of offset and count exceeds length then all characters from offset to the end of the data are deleted.

Parameters: offset The offset from which to remove characters. count The number of characters to delete.

replaceData (offset, count, arg)

Replaces the characters starting at the specified character offset with the specified string.

Parameters: offset The offset from which to start replacing. count The number of characters to replace. arg The DOMString with which the range must be replaced.

If the sum of offset and count exceeds length, then all characters to the end of the data are replaced (i.e., the effect is the same as a remove method call with the same range, followed by an append method invocation).

XML::DOM::Attr extends XML::DOM::Node

    The Attr nodes built by the XML::DOM::Parser always have one child node which is a Text node containing the expanded string value (i.e. EntityReferences are always expanded.) EntityReferences may be added when modifying or creating a new Document.

    The Attr interface represents an attribute in an Element object. Typically the allowable values for the attribute are defined in a document type definition.

    Attr objects inherit the Node interface, but since they are not actually child nodes of the element they describe, the DOM does not consider them part of the document tree. Thus, the Node attributes parentNode, previousSibling, and nextSibling have a undef value for Attr objects. The DOM takes the view that attributes are properties of elements rather than having a separate identity from the elements they are associated with; this should make it more efficient to implement such features as default attributes associated with all elements of a given type. Furthermore, Attr nodes may not be immediate children of a DocumentFragment. However, they can be associated with Element nodes contained within a DocumentFragment. In short, users and implementors of the DOM need to be aware that Attr nodes have some things in common with other objects inheriting the Node interface, but they also are quite distinct.

    The attribute's effective value is determined as follows: if this attribute has been explicitly assigned any value, that value is the attribute's effective value; otherwise, if there is a declaration for this attribute, and that declaration includes a default value, then that default value is the attribute's effective value; otherwise, the attribute does not exist on this element in the structure model until it has been explicitly added. Note that the nodeValue attribute on the Attr instance can also be used to retrieve the string version of the attribute's value(s).

    In XML, where the value of an attribute can contain entity references, the child nodes of the Attr node provide a representation in which entity references are not expanded. These child nodes may be either Text or EntityReference nodes. Because the attribute type may be unknown, there are no tokenized attribute values.

    getValue

    On retrieval, the value of the attribute is returned as a string. Character and general entity references are replaced with their values.

    setValue (str)

    DOM Spec: On setting, this creates a Text node with the unparsed contents of the string.

    getName

    Returns the name of this attribute.

XML::DOM::Element extends XML::DOM::Node

By far the vast majority of objects (apart from text) that authors encounter when traversing a document are Element nodes. Assume the following XML document:

     <elementExample id="demo">
       <subelement1/>
       <subelement2><subsubelement/></subelement2>
     </elementExample>

When represented using DOM, the top node is an Element node for "elementExample", which contains two child Element nodes, one for "subelement1" and one for "subelement2". "subelement1" contains no child nodes.

Elements may have attributes associated with them; since the Element interface inherits from Node, the generic Node interface method getAttributes may be used to retrieve the set of all attributes for an element. There are methods on the Element interface to retrieve either an Attr object by name or an attribute value by name. In XML, where an attribute value may contain entity references, an Attr object should be retrieved to examine the possibly fairly complex sub-tree representing the attribute value. On the other hand, in HTML, where all attributes have simple string values, methods to directly access an attribute value can safely be used as a convenience.

getTagName

The name of the element. For example, in:

               <elementExample id="demo">
                       ...
               </elementExample>

tagName has the value "elementExample". Note that this is case-preserving in XML, as are all of the operations of the DOM.

getAttribute (name)

Retrieves an attribute value by name.

Return Value: The Attr value as a string, or the empty string if that attribute does not have a specified or default value.

setAttribute (name, value)

Adds a new attribute. If an attribute with that name is already present in the element, its value is changed to be that of the value parameter. This value is a simple string, it is not parsed as it is being set. So any markup (such as syntax to be recognized as an entity reference) is treated as literal text, and needs to be appropriately escaped by the implementation when it is written out. In order to assign an attribute value that contains entity references, the user must create an Attr node plus any Text and EntityReference nodes, build the appropriate subtree, and use setAttributeNode to assign it as the value of an attribute.

DOMExceptions:

  • INVALID_CHARACTER_ERR

    Raised if the specified name contains an invalid character.

  • NO_MODIFICATION_ALLOWED_ERR

    Raised if this node is readonly.

removeAttribute (name)

Removes an attribute by name. If the removed attribute has a default value it is immediately replaced.

DOMExceptions:

  • NO_MODIFICATION_ALLOWED_ERR

    Raised if this node is readonly.

getAttributeNode

Retrieves an Attr node by name.

Return Value: The Attr node with the specified attribute name or undef if there is no such attribute.

setAttributeNode (attr)

Adds a new attribute. If an attribute with that name is already present in the element, it is replaced by the new one.

Return Value: If the newAttr attribute replaces an existing attribute with the same name, the previously existing Attr node is returned, otherwise undef is returned.

DOMExceptions:

  • WRONG_DOCUMENT_ERR

    Raised if newAttr was created from a different document than the one that created the element.

  • NO_MODIFICATION_ALLOWED_ERR

    Raised if this node is readonly.

  • INUSE_ATTRIBUTE_ERR

    Raised if newAttr is already an attribute of another Element object. The DOM user must explicitly clone Attr nodes to re-use them in other elements.

removeAttributeNode (oldAttr)

Removes the specified attribute. If the removed Attr has a default value it is immediately replaced. If the Attr already is the default value, nothing happens and nothing is returned.

Parameters: oldAttr The Attr node to remove from the attribute list.

Return Value: The Attr node that was removed.

DOMExceptions:

  • NO_MODIFICATION_ALLOWED_ERR

    Raised if this node is readonly.

  • NOT_FOUND_ERR

    Raised if oldAttr is not an attribute of the element.

Additional methods not in the DOM Spec

setTagName (newTagName)

Sets the tag name of the Element. Note that this method is not portable between DOM implementations.

DOMExceptions:

  • INVALID_CHARACTER_ERR

    Raised if the specified name contains an invalid character.

XML::DOM::Text extends XML::DOM::CharacterData

The Text interface represents the textual content (termed character data in XML) of an Element or Attr. If there is no markup inside an element's content, the text is contained in a single object implementing the Text interface that is the only child of the element. If there is markup, it is parsed into a list of elements and Text nodes that form the list of children of the element.

When a document is first made available via the DOM, there is only one Text node for each block of text. Users may create adjacent Text nodes that represent the contents of a given element without any intervening markup, but should be aware that there is no way to represent the separations between these nodes in XML or HTML, so they will not (in general) persist between DOM editing sessions. The normalize() method on Element merges any such adjacent Text objects into a single node for each block of text; this is recommended before employing operations that depend on a particular document structure, such as navigation with XPointers.

Not Implemented: The XML::DOM::Parser converts all CDATASections to regular text, so as far as I know, there is know way to preserve them. If you add CDATASection nodes to a Document yourself, they will be preserved.

splitText (offset)

Breaks this Text node into two Text nodes at the specified offset, keeping both in the tree as siblings. This node then only contains all the content up to the offset point. And a new Text node, which is inserted as the next sibling of this node, contains all the content at and after the offset point.

Parameters: offset The offset at which to split, starting from 0.

Return Value: The new Text node.

DOMExceptions:

  • INDEX_SIZE_ERR

    Raised if the specified offset is negative or greater than the number of characters in data.

  • NO_MODIFICATION_ALLOWED_ERR

    Raised if this node is readonly.

XML::DOM::Comment extends XML::DOM::CharacterData

This represents the content of a comment, i.e., all the characters between the starting '<!--' and ending '-->'. Note that this is the definition of a comment in XML, and, in practice, HTML, although some HTML tools may implement the full SGML comment structure.

XML::DOM::CDATASection extends XML::DOM::CharacterData

CDATA sections are used to escape blocks of text containing characters that would otherwise be regarded as markup. The only delimiter that is recognized in a CDATA section is the "]]>" string that ends the CDATA section. CDATA sections can not be nested. The primary purpose is for including material such as XML fragments, without needing to escape all the delimiters.

The DOMString attribute of the Text node holds the text that is contained by the CDATA section. Note that this may contain characters that need to be escaped outside of CDATA sections and that, depending on the character encoding ("charset") chosen for serialization, it may be impossible to write out some characters as part of a CDATA section.

The CDATASection interface inherits the CharacterData interface through the Text interface. Adjacent CDATASections nodes are not merged by use of the Element.normalize() method.

Not Implemented: see Text node comments about CDATASections being converted to Text nodes when parsing XML input.

XML::DOM::ProcessingInstruction extends XML::DOM::Node

The ProcessingInstruction interface represents a "processing instruction", used in XML as a way to keep processor-specific information in the text of the document. An example:

 <?PI processing instruction?>

Here, "PI" is the target and "processing instruction" is the data.

getTarget

The target of this processing instruction. XML defines this as being the first token following the markup that begins the processing instruction.

getData and setData (data)

The content of this processing instruction. This is from the first non white space character after the target to the character immediately preceding the ?>.

XML::DOM::Notation extends XML::DOM::Node

This node represents a Notation, e.g.

 <!NOTATION gs SYSTEM "GhostScript">

 <!NOTATION name PUBLIC "pubId">

 <!NOTATION name PUBLIC "pubId" "sysId">

 <!NOTATION name SYSTEM "sysId">
getName and setName (name)

Returns (or sets) the Notation name, which is the first token after the NOTATION keyword.

getSysId and setSysId (sysId)

Returns (or sets) the system ID, which is the token after the optional SYSTEM keyword.

getPubId and setPubId (pubId)

Returns (or sets) the public ID, which is the token after the optional PUBLIC keyword.

getBase

This is passed by XML::Parser in the Notation handler. I don't know what it is yet.

getNodeName

Returns the same as getName.

XML::DOM::Entity extends XML::DOM::Node

This node represents an Entity declaration, e.g.

 <!ENTITY % draft 'INCLUDE'>

 <!ENTITY hatch-pic SYSTEM "../grafix/OpenHatch.gif" NDATA gif>

The first one is called a parameter entity and is referenced like this: %draft; The 2nd is a (regular) entity and is referenced like this: &hatch-pic;

getNotationName

Returns the name of the notation for the entity.

Not Implemented The DOM Spec says: For unparsed entities, the name of the notation for the entity. For parsed entities, this is null. (This implementation does not support unparsed entities.)

getSysId

Returns the system id, or undef.

getPubId

Returns the public id, or undef.

Additional methods not in the DOM Spec

isParameterEntity

Whether it is a parameter entity (%ent;) or not (&ent;)

getValue

Returns the entity value.

getNdata

Returns the NDATA declaration (for general unparsed entities), or undef.

XML::DOM::DocumentType extends XML::DOM::Node

Each Document has a doctype attribute whose value is either null or a DocumentType object. The DocumentType interface in the DOM Level 1 Core provides an interface to the list of entities that are defined for the document, and little else because the effect of namespaces and the various XML scheme efforts on DTD representation are not clearly understood as of this writing. The DOM Level 1 doesn't support editing DocumentType nodes.

Not In DOM Spec: This implementation has added a lot of extra functionality to the DOM Level 1 interface. To allow editing of the DocumentType nodes, see XML::DOM::ignoreReadOnly.

getName

Returns the name of the DTD, i.e. the name immediately following the DOCTYPE keyword.

getEntities

A NamedNodeMap containing the general entities, both external and internal, declared in the DTD. Duplicates are discarded. For example in:

 <!DOCTYPE ex SYSTEM "ex.dtd" [
  <!ENTITY foo "foo">
  <!ENTITY bar "bar">
  <!ENTITY % baz "baz">
 ]>
 <ex/>

the interface provides access to foo and bar but not baz. Every node in this map also implements the Entity interface.

The DOM Level 1 does not support editing entities, therefore entities cannot be altered in any way.

Not In DOM Spec: See XML::DOM::ignoreReadOnly to edit the DocumentType etc.

getNotations

A NamedNodeMap containing the notations declared in the DTD. Duplicates are discarded. Every node in this map also implements the Notation interface.

The DOM Level 1 does not support editing notations, therefore notations cannot be altered in any way.

Not In DOM Spec: See XML::DOM::ignoreReadOnly to edit the DocumentType etc.

Additional methods not in the DOM Spec

Creating and setting the DocumentType

A new DocumentType can be created with:

        $doctype = $doc->createDocumentType ($name, $sysId, $pubId, $internal);

To set (or replace) the DocumentType for a particular document, use:

        $doc->setDocType ($doctype);
getSysId and setSysId (sysId)

Returns or sets the system id.

getPubId and setPubId (pudId)

Returns or sets the public id.

setName (name)

Sets the name of the DTD, i.e. the name immediately following the DOCTYPE keyword. Note that this should always be the same as the element tag name of the root element.

getAttlistDecl (elemName)

Returns the AttlistDecl for the Element with the specified name, or undef.

getElementDecl (elemName)

Returns the ElementDecl for the Element with the specified name, or undef.

getEntity (entityName)

Returns the Entity with the specified name, or undef.

addAttlistDecl (elemName)

Adds a new AttDecl node with the specified elemName if one doesn't exist yet. Returns the AttlistDecl (new or existing) node.

addElementDecl (elemName, model)

Adds a new ElementDecl node with the specified elemName and model if one doesn't exist yet. Returns the AttlistDecl (new or existing) node. The model is ignored if one already existed.

addEntity (parameter, notationName, value, sysId, pubId, ndata)

Adds a new Entity node. Don't use createEntity and appendChild, because it should be added to the internal NamedNodeMap containing the entities.

Parameters: parameter whether it is a parameter entity (%ent;) or not (&ent;). notationName the entity name. value the entity value. sysId the system id (if any.) pubId the public id (if any.) ndata the NDATA declaration (if any, for general unparsed entities.)

SysId, pubId and ndata may be undefined.

DOMExceptions:

  • INVALID_CHARACTER_ERR

    Raised if the notationName does not conform to the XML spec.

addNotation (name, base, sysId, pubId)

Adds a new Notation object.

Parameters: name the notation name. base the base to be used for resolving a relative URI. sysId the system id. pubId the public id.

Base, sysId, and pubId may all be undefined. (These parameters are passed by the XML::Parser Notation handler.)

DOMExceptions:

  • INVALID_CHARACTER_ERR

    Raised if the notationName does not conform to the XML spec.

addAttDef (elemName, attrName, type, default, fixed)

Adds a new attribute definition. It will add the AttDef node to the AttlistDecl if it exists. If an AttDef with the specified attrName already exists for the given elemName, this function only generates a warning.

See XML::DOM::AttDef::new for the other parameters.

getDefaultAttrValue (elem, attr)

Returns the default attribute value as a string or undef, if none is available.

Parameters: elem The element tagName. attr The attribute name.

expandEntity (entity [, parameter])

Expands the specified entity or parameter entity (if parameter=1) and returns its value as a string, or undef if the entity does not exist. (The entity name should not contain the '%', '&' or ';' delimiters.)

XML::DOM::DocumentFragment extends XML::DOM::Node

DocumentFragment is a "lightweight" or "minimal" Document object. It is very common to want to be able to extract a portion of a document's tree or to create a new fragment of a document. Imagine implementing a user command like cut or rearranging a document by moving fragments around. It is desirable to have an object which can hold such fragments and it is quite natural to use a Node for this purpose. While it is true that a Document object could fulfil this role, a Document object can potentially be a heavyweight object, depending on the underlying implementation. What is really needed for this is a very lightweight object. DocumentFragment is such an object.

Furthermore, various operations -- such as inserting nodes as children of another Node -- may take DocumentFragment objects as arguments; this results in all the child nodes of the DocumentFragment being moved to the child list of this node.

The children of a DocumentFragment node are zero or more nodes representing the tops of any sub-trees defining the structure of the document. DocumentFragment nodes do not need to be well-formed XML documents (although they do need to follow the rules imposed upon well-formed XML parsed entities, which can have multiple top nodes). For example, a DocumentFragment might have only one child and that child node could be a Text node. Such a structure model represents neither an HTML document nor a well-formed XML document.

When a DocumentFragment is inserted into a Document (or indeed any other Node that may take children) the children of the DocumentFragment and not the DocumentFragment itself are inserted into the Node. This makes the DocumentFragment very useful when the user wishes to create nodes that are siblings; the DocumentFragment acts as the parent of these nodes so that the user can use the standard methods from the Node interface, such as insertBefore() and appendChild().

XML::DOM::DOMImplementation

The DOMImplementation interface provides a number of methods for performing operations that are independent of any particular instance of the document object model.

The DOM Level 1 does not specify a way of creating a document instance, and hence document creation is an operation specific to an implementation. Future Levels of the DOM specification are expected to provide methods for creating documents directly.

hasFeature (feature, version)

Returns 1 if and only if feature equals "XML" and version equals "1.0".

XML::DOM::Document extends XML::DOM::Node

This is the main root of the document structure as returned by XML::DOM::Parser::parse and XML::DOM::Parser::parsefile.

Since elements, text nodes, comments, processing instructions, etc. cannot exist outside the context of a Document, the Document interface also contains the factory methods needed to create these objects. The Node objects created have a getOwnerDocument method which associates them with the Document within whose context they were created.

getDocumentElement

This is a convenience method that allows direct access to the child node that is the root Element of the document.

getDoctype

The Document Type Declaration (see DocumentType) associated with this document. For HTML documents as well as XML documents without a document type declaration this returns undef. The DOM Level 1 does not support editing the Document Type Declaration.

Not In DOM Spec: This implementation allows editing the doctype. See XML::DOM::ignoreReadOnly for details.

getImplementation

The DOMImplementation object that handles this document. A DOM application may use objects from multiple implementations.

createElement (tagName)

Creates an element of the type specified. Note that the instance returned implements the Element interface, so attributes can be specified directly on the returned object.

DOMExceptions:

  • INVALID_CHARACTER_ERR

    Raised if the tagName does not conform to the XML spec.

createTextNode (data)

Creates a Text node given the specified string.

createComment (data)

Creates a Comment node given the specified string.

createCDATASection (data)

Creates a CDATASection node given the specified string.

createAttribute (name [, value [, specified ]])

Creates an Attr of the given name. Note that the Attr instance can then be set on an Element using the setAttribute method.

Not In DOM Spec: The DOM Spec does not allow passing the value or the specified property in this method. In this implementation they are optional.

Parameters: value The attribute's value. See Attr::setValue for details. If the value is not supplied, the specified property is set to 0. specified Whether the attribute value was specified or whether the default value was used. If not supplied, it's assumed to be 1.

DOMExceptions:

  • INVALID_CHARACTER_ERR

    Raised if the name does not conform to the XML spec.

createProcessingInstruction (target, data)

Creates a ProcessingInstruction node given the specified name and data strings.

Parameters: target The target part of the processing instruction. data The data for the node.

DOMExceptions:

  • INVALID_CHARACTER_ERR

    Raised if the target does not conform to the XML spec.

createDocumentFragment

Creates an empty DocumentFragment object.

createEntityReference (name)

Creates an EntityReference object.

Additional methods not in the DOM Spec

getXMLDecl and setXMLDecl (xmlDecl)

Returns the XMLDecl for this Document or undef if none was specified. Note that XMLDecl is not part of the list of child nodes.

setDoctype (doctype)

Sets or replaces the DocumentType. NOTE: Don't use appendChild or insertBefore to set the DocumentType. Even though doctype will be part of the list of child nodes, it is handled specially.

getDefaultAttrValue (elem, attr)

Returns the default attribute value as a string or undef, if none is available.

Parameters: elem The element tagName. attr The attribute name.

getEntity (name)

Returns the Entity with the specified name.

createXMLDecl (version, encoding, standalone)

Creates an XMLDecl object. All parameters may be undefined.

createDocumentType (name, sysId, pubId)

Creates a DocumentType object. SysId and pubId may be undefined.

createNotation (name, base, sysId, pubId)

Creates a new Notation object. Consider using XML::DOM::DocumentType::addNotation!

createEntity (parameter, notationName, value, sysId, pubId, ndata)

Creates an Entity object. Consider using XML::DOM::DocumentType::addEntity!

createElementDecl (name, model)

Creates an ElementDecl object.

DOMExceptions:

  • INVALID_CHARACTER_ERR

    Raised if the element name (tagName) does not conform to the XML spec.

createAttlistDecl (name)

Creates an AttlistDecl object.

DOMExceptions:

  • INVALID_CHARACTER_ERR

    Raised if the element name (tagName) does not conform to the XML spec.

expandEntity (entity [, parameter])

Expands the specified entity or parameter entity (if parameter=1) and returns its value as a string, or undef if the entity does not exist. (The entity name should not contain the '%', '&' or ';' delimiters.)

EXTRA NODE TYPES

XML::DOM::XMLDecl extends XML::DOM::Node

This node contains the XML declaration, e.g.

 <?xml version="1.0" encoding="UTF-16" standalone="yes"?>

See also XML::DOM::Document::getXMLDecl.

getVersion and setVersion (version)

Returns and sets the XML version. At the time of this writing the version should always be "1.0"

getEncoding and setEncoding (encoding)

undef may be specified for the encoding value.

getStandalone and setStandalone (standalone)

undef may be specified for the standalone value.

XML::DOM::ElementDecl extends XML::DOM::Node

This node represents an Element declaration, e.g.

 <!ELEMENT address (street+, city, state, zip, country?)>
getName

Returns the Element tagName.

getModel and setModel (model)

Returns and sets the model as a string, e.g. "(street+, city, state, zip, country?)" in the above example.

XML::DOM::AttlistDecl extends XML::DOM::Node

This node represents an ATTLIST declaration, e.g.

 <!ATTLIST person
   sex      (male|female)  #REQUIRED
   hair     CDATA          "bold"
   eyes     (none|one|two) "two"
   species  (human)        #FIXED "human"> 

Each attribute definition is stored a separate AttDef node. The AttDef nodes can be retrieved with getAttDef and added with addAttDef. (The AttDef nodes are stored in a NamedNodeMap internally.)

getName

Returns the Element tagName.

getAttDef (attrName)

Returns the AttDef node for the attribute with the specified name.

addAttDef (attrName, type, default, [ fixed ])

Adds a AttDef node for the attribute with the specified name.

Parameters: attrName the attribute name. type the attribute type (e.g. "CDATA" or "(male|female)".) default the default value enclosed in quotes (!), the string #IMPLIED or the string #REQUIRED. fixed whether the attribute is '#FIXED' (default is 0.)

XML::DOM::AttDef extends XML::DOM::Node

Each object of this class represents one attribute definition in an AttlistDecl.

getName

Returns the attribute name.

getDefault

Returns the default value, or undef.

isFixed

Whether the attribute value is fixed (see #FIXED keyword.)

isRequired

Whether the attribute value is required (see #REQUIRED keyword.)

isImplied

Whether the attribute value is implied (see #IMPLIED keyword.)

IMPLEMENTATION DETAILS

  • Perl Mappings

    The value undef was used when the DOM Spec said null.

    The DOM Spec says: Applications must encode DOMString using UTF-16 (defined in Appendix C.3 of [UNICODE] and Amendment 1 of [ISO-10646]). In this implementation we use plain old Perl strings encoded in UTF-8 instead of UTF-16.

  • Text and CDATASection nodes

    The Expat parser expands EntityReferences and CDataSection sections to raw strings and does not indicate where it was found. This implementation does therefore convert both to Text nodes at parse time. CDATASection and EntityReference nodes that are added to an existing Document (by the user) will be preserved.

    Also, subsequent Text nodes are always merged at parse time. Text nodes that are added later can be merged with the normalize method. Consider using the addText method when adding Text nodes.

  • Printing and toString

    When printing (and converting an XML Document to a string) the strings have to encoded differently depending on where they occur. E.g. in a CDATASection all substrings are allowed except for "]]>". In regular text, certain characters are not allowed, e.g. ">" has to be converted to "&gt;". These routines should be verified by someone who knows the details.

  • Quotes

    Certain sections in XML are quoted, like attribute values in an Element. XML::Parser strips these quotes and the print methods in this implementation always uses double quotes, so when parsing and printing a document, single quotes may be converted to double quotes. The default value of an attribute definition (AttDef) in an AttlistDecl, however, will maintain its quotes.

  • AttlistDecl

    Attribute declarations for a certain Element are always merged into a single AttlistDecl object.

  • Comments

    Comments in the DOCTYPE section are not kept in the right place. They will become child nodes of the Document.

SEE ALSO

The Japanese version of this document by Takanori Kawai (Hippo2000) at http://member.nifty.ne.jp/hippo2000/perltips/xml/dom.htm

The DOM Level 1 specification at http://www.w3.org/TR/REC-DOM-Level-1

The XML spec (Extensible Markup Language 1.0) at http://www.w3.org/TR/REC-xml

The XML::Parser and XML::Parser::Expat manual pages.

CAVEATS

The method getElementsByTagName() does not return a "live" NodeList. Whether this is an actual caveat is debatable, but a few people on the www-dom mailing list seemed to think so. I haven't decided yet. It's a pain to implement, it slows things down and the benefits seem marginal. Let me know what you think.

(To subscribe to the www-dom mailing list send an email with the subject "subscribe" to www-dom-request@w3.org. I only look here occasionally, so don't send bug reports or suggestions about XML::DOM to this list, send them to enno@att.com instead.)

AUTHORS

Enno Derksen <enno@att.com> and Clark Cooper <coopercl@sch.ge.com>. Please send bugs, comments and suggestions to Enno.

30 POD Errors

The following errors were encountered while parsing the POD:

Around line 4400:

You forgot a '=back' before '=head2'

Around line 4409:

You forgot a '=back' before '=head2'

Around line 4411:

'=item' outside of any '=over'

Around line 4498:

'=item' outside of any '=over'

Around line 4500:

You forgot a '=back' before '=head2'

Around line 4509:

You forgot a '=back' before '=head2'

Around line 4511:

'=item' outside of any '=over'

Around line 4790:

You forgot a '=back' before '=head2'

Around line 4792:

'=item' outside of any '=over'

Around line 4871:

'=item' outside of any '=over'

Around line 4898:

You forgot a '=back' before '=head2'

You forgot a '=back' before '=head2'

Around line 4900:

'=item' outside of any '=over'

Around line 4908:

'=item' outside of any '=over'

Around line 5002:

You forgot a '=back' before '=head2'

You forgot a '=back' before '=head2'

Around line 5004:

'=item' outside of any '=over'

Around line 5029:

'=item' outside of any '=over'

Around line 5111:

You can't have =items (as at line 5153) unless the first thing after the =over is an =item

Around line 5322:

You forgot a '=back' before '=head2'

You forgot a '=back' before '=head2'

Around line 5324:

'=item' outside of any '=over'

Around line 5341:

'=item' outside of any '=over'

Around line 5521:

You forgot a '=back' before '=head2'

You forgot a '=back' before '=head2'

Around line 5523:

'=item' outside of any '=over'

Around line 5537:

'=item' outside of any '=over'

Around line 5590:

You forgot a '=back' before '=head2'

You forgot a '=back' before '=head2'

Around line 5592:

'=item' outside of any '=over'

Around line 5712:

'=item' outside of any '=over'

Around line 5878:

You forgot a '=back' before '=head2'

You forgot a '=back' before '=head2'

Around line 5880:

'=item' outside of any '=over'

Around line 5959:

'=item' outside of any '=over'

Around line 6068:

You forgot a '=back' before '=head1'