The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

HTML::Object::DOM - HTML Object DOM Class

SYNOPSIS

    use HTML::Object::DOM;
    my $this = HTML::Object::DOM->new || die( HTML::Object::DOM->error, "\n" );

VERSION

    v0.2.0

DESCRIPTION

This module implement DOM-like interface to HTML objects and inherits from HTML::Object, so you can call this module instead, to parse HTML data and it the resulting tree of objects will have DOM capabilities.

DOM stands for Document Object Model.

There are 2 divergences from the standard:

1. nodeName

nodeName returns the tag name in lower case instead of the upper case

2. Space and text

This interface makes the difference between text and space-only text whereas the DOM standard specification treats both as a text node.

This leads to a new non-standard constant nodeType for space having a value SPACE_NODE (13) and the non-standard constant SHOW_SPACE in HTML::Object::DOM::NodeFilter

INHERITANCE

    +--------------+     +-------------------+
    | HTML::Object | --> | HTML::Object::DOM |
    +--------------+     +-------------------+

CONSTRUCTOR

new

Provided with an hash or hash reference of options and this returns a new HTML::Object::DOM object. Options available are the same as the methods available.

METHODS

current_parent

This represent the parent for the current element being processed by the parser.

document

Set or get the element object for the document.

get_definition

Provided with a tag name and this will return its corresponding hash reference or undef if there is no such tag or an error occurred somehow.

get_dom

Get the value for the global variable $GLOBAL_DOM, which should be a HTML::Object::DOM::Document object.

new_closing

Instantiates a new closing element, passing it any arguments received, and returns the new object.

new_comment

Instantiates a new comment element, passing it any arguments received, and returns the new object.

new_declaration

Instantiates a new declaration element, passing it any arguments received, and returns the new object.

new_document

Instantiates a new document element, passing it any arguments received, and returns the new object.

The new document object has its property defaultView set to a new window object

new_element

Instantiates a new element, passing it any arguments received, and returns the new object.

new_space

Instantiates a new space element, passing it any arguments received, and returns the new object.

new_text

Instantiates a new text element, passing it any arguments received, and returns the new object.

new_window

Instantiates a new window object, passing it any arguments received, and returns the new object.

onload

Set or get the code reference to be executed when the parsing of the html data has been completed.

The value of this code reference is provided to the new document when it is instantiated.

Upon execution, $_ is set to the HTML document object, and a new event is passed of type readstate and with the detail property having the following data available:

document

The document object

state

The state of the document parsing.

The event target property is also set to the document object.

onreadystatechange

Set or get the code reference to be executed whenever there is a change of state to the document. 3 states are available: loading, interactive and complete

The value of this code reference is provided to the new document when it is instantiated.

Upon execution, $_ is set to the HTML document object, and a new event is passed of type readstate and with the detail property having the following data available:

document

The document object

state

The state of the document parsing.

The event target property is also set to the document object.

parseFromString

Provided with some HTML data, and this will parse it and return a new document object or undef if an error occurred.

screen

Returns the HTML::Object::DOM::Screen object.

set_dom

Set the global variable $GLOBAL_DOM which must be a HTML::Object::DOM::Document

window

Set or get the window object for this new parser and the document it creates.

CONSTANTS

The following constants can be exported and used, such as:

    use HTML::Object::DOM qw( :event );
    # or directly
    use HTML::Object::Event qw( :all );
NONE (0)

The event is not being processed at this time.

CAPTURING_PHASE (1)

The event is being propagated through the target's ancestor objects. This process starts with the Document, then the HTML html element, and so on through the elements until the target's parent is reached. Event listeners registered for capture mode when "addEventListener" in HTML::Object::EventTarget was called are triggered during this phase.

AT_TARGET (2)

The event has arrived at the event's target. Event listeners registered for this phase are called at this time. If "bubbles" is false, processing the event is finished after this phase is complete.

BUBBLING_PHASE (3)

The event is propagating back up through the target's ancestors in reverse order, starting with the parent, and eventually reaching the containing document. This is known as bubbling, and occurs only if "bubbles" is true. Event listeners registered for this phase are triggered during this process.

CANCEL_PROPAGATION (1)

State of the propagation being cancelled.

    $event->stopPropagation();
    $event->cancelled == CANCEL_PROPAGATION;
CANCEL_IMMEDIATE_PROPAGATION (2)

State of immediate propagation being cancelled.

    $event->stopImmediatePropagation();
    $event->cancelled == CANCEL_IMMEDIATE_PROPAGATION;

For HTML::Object::DOM::Element::Media:

    use HTML::Object::DOM qw( :media );
    # or directly from HTML::Object::DOM::Element::Media
    use HTML::Object::DOM::Element::Media qw( :all );
NETWORK_EMPTY (0)

There is no data yet. Also, readyState is HAVE_NOTHING.

NETWORK_IDLE (1)

Media element is active and has selected a resource, but is not using the network.

NETWORK_LOADING (2)

The browser is downloading HTML::Object::DOM::Element::Media data.

NETWORK_NO_SOURCE (3)

No HTML::Object::DOM::Element::Media src found.

For HTML::Object::DOM::Element::Track:

    use HTML::Object::DOM qw( :track );
    # or directly from HTML::Object::DOM::Element::Track
    use HTML::Object::DOM::Element::Track qw( :all );
NONE (0)

Indicates that the text track's cues have not been obtained.

Also used in HTML::Object::Event to indicate the event is not being processed at this time.

LOADING (1)

Indicates that the text track is loading and there have been no fatal errors encountered so far. Further cues might still be added to the track by the parser.

LOADED (2)

Indicates that the text track has been loaded with no fatal errors.

ERROR (3)

Indicates that the text track was enabled, but when the user agent attempted to obtain it, this failed in some way. Some or all of the cues are likely missing and will not be obtained.

For HTML::Object::DOM::Node:

    use HTML::Object::DOM qw( :node );
    # or directly from HTML::Object::DOM::Node
    # Automatically exported
    use HTML::Object::DOM::Node;
  • DOCUMENT_POSITION_IDENTICAL (0 or in bits: 000000)

    Elements are identical.

  • DOCUMENT_POSITION_DISCONNECTED (1 or in bits: 000001)

    No relationship, both nodes are in different documents or different trees in the same document.

  • DOCUMENT_POSITION_PRECEDING (2 or in bits: 000010)

    The specified node precedes the current node.

  • DOCUMENT_POSITION_FOLLOWING (4 or in bits: 000100)

    The specified node follows the current node.

  • DOCUMENT_POSITION_CONTAINS (8 or in bits: 001000)

    The otherNode is an ancestor of / contains the current node.

  • DOCUMENT_POSITION_CONTAINED_BY (16 or in bits: 010000)

    The otherNode is a descendant of / contained by the node.

  • DOCUMENT_POSITION_IMPLEMENTATION_SPECIFIC (32 or in bits: 100000)

    The specified node and the current node have no common container node or the two nodes are different attributes of the same node.

And also the following constants:

ELEMENT_NODE (1)
ATTRIBUTE_NODE (2)
TEXT_NODE (3)
CDATA_SECTION_NODE (4)
ENTITY_REFERENCE_NODE (5)
ENTITY_NODE (6)
PROCESSING_INSTRUCTION_NODE (7)
COMMENT_NODE (8)
DOCUMENT_NODE (9)
DOCUMENT_TYPE_NODE (10)
DOCUMENT_FRAGMENT_NODE (11)
NOTATION_NODE (12)
SPACE_NODE (13)

For HTML::Object::DOM::NodeFilter:

    use HTML::Object::DOM qw( :filter );
    # or directly from HTML::Object::DOM::NodeFilter
    # Exportable constants
    use HTML::Object::DOM::NodeFilter qw( :all );
SHOW_ALL (4294967295)

Shows all nodes.

SHOW_ELEMENT (1)

Shows Element nodes.

SHOW_ATTRIBUTE (2)

Shows attribute Attribute nodes.

SHOW_TEXT (4)

Shows Text nodes.

SHOW_CDATA_SECTION (8)

Will always returns nothing, because there is no support for xml documents.

SHOW_ENTITY_REFERENCE (16)

Legacy, no more used.

SHOW_ENTITY (32)

Legacy, no more used.

SHOW_PROCESSING_INSTRUCTION (64)

Shows ProcessingInstruction nodes.

SHOW_COMMENT (128)

Shows Comment nodes.

SHOW_DOCUMENT (256)

Shows Document nodes

SHOW_DOCUMENT_TYPE (512)

Shows DocumentType nodes

SHOW_DOCUMENT_FRAGMENT (1024)

Shows HTML::Object::DOM::DocumentFragment nodes.

SHOW_NOTATION (2048)

Legacy, no more used.

SHOW_SPACE (4096)

Show Space nodes. This is a non-standard extension under this perl framework.

For HTML::Object::DOM::XPathResult:

    use HTML::Object::DOM qw( :xpath );
    # or directly from HTML::Object::DOM::Element::Track
    # Automatically exported
    use HTML::Object::DOM::XPathResult;
ANY_TYPE (0)

A result set containing whatever type naturally results from evaluation of the expression. Note that if the result is a node-set then UNORDERED_NODE_ITERATOR_TYPE is always the resulting type.

NUMBER_TYPE (1)

A result containing a single number. This is useful for example, in an XPath expression using the count() function.

STRING_TYPE (2)

A result containing a single string.

BOOLEAN_TYPE (3)

A result containing a single boolean value. This is useful for example, in an XPath expression using the not() function.

UNORDERED_NODE_ITERATOR_TYPE (4)

A result node-set containing all the nodes matching the expression. The nodes may not necessarily be in the same order that they appear in the document.

ORDERED_NODE_ITERATOR_TYPE (5)

A result node-set containing all the nodes matching the expression. The nodes in the result set are in the same order that they appear in the document.

UNORDERED_NODE_SNAPSHOT_TYPE (6)

A result node-set containing snapshots of all the nodes matching the expression. The nodes may not necessarily be in the same order that they appear in the document.

ORDERED_NODE_SNAPSHOT_TYPE (7)

A result node-set containing snapshots of all the nodes matching the expression. The nodes in the result set are in the same order that they appear in the document.

ANY_UNORDERED_NODE_TYPE (8)

A result node-set containing any single node that matches the expression. The node is not necessarily the first node in the document that matches the expression.

FIRST_ORDERED_NODE_TYPE (9)

A result node-set containing the first node in the document that matches the expression.

AUTHOR

Jacques Deguest <jack@deguest.jp>

SEE ALSO

Mozilla documentation

Mozilla documentation on HTML DOM API

W3C standard, HTML elements specifications

COPYRIGHT & LICENSE

Copyright(c) 2021 DEGUEST Pte. Ltd.

All rights reserved

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.