The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

HTML::Object::DOM::NodeIterator - HTML Object DOM Node Iterator Class

SYNOPSIS

With just one argument, this default to search for everything (SHOW_ALL) and to use the default filter, which always returns FILTER_ACCEPT

    use HTML::Object::DOM::NodeIterator;
    my $nodes = HTML::Object::DOM::NodeIterator->new( $root_node ) || 
        die( HTML::Object::DOM::NodeIterator->error, "\n" );

Or, passing an anonymous subroutine as the filter

    my $nodes = HTML::Object::DOM::NodeIterator->new(
        $root_node,
        $what_to_show_bit,
        sub{ return( FILTER_ACCEPT ); }
    ) || die( HTML::Object::DOM::NodeIterator->error, "\n" );

Or, passing an hash reference with a property 'acceptNode' whose value is an anonymous subroutine, as the filter

    my $nodes = HTML::Object::DOM::NodeIterator->new(
        $root_node,
        $what_to_show_bit,
        {
            acceptNode => sub{ return( FILTER_ACCEPT ); }
        }
    ) || die( HTML::Object::DOM::NodeIterator->error, "\n" );

Or, passing an object that implements the method "acceptNode"

    my $nodes = HTML::Object::DOM::NodeIterator->new(
        $root_node,
        $what_to_show_bit,
        # This object must implement the acceptNode method
        My::Customer::NodeFilter->new
    ) || die( HTML::Object::DOM::NodeIterator->error, "\n" );

There is also HTML::Object::DOM::TreeWalker, which performs a somewhat similar function.

Choose HTML::Object::DOM::NodeIterator when you only need a simple iterator to filter and browse the selected nodes, and choose HTML::Object::DOM::TreeWalker when you need to access to the node and its siblings.

VERSION

    v0.2.0

DESCRIPTION

The NodeIterator interface represents an iterator over the members of a list of the nodes in a subtree of the DOM. The nodes will be returned in document order.

A NodeIterator can be created using the "createNodeIterator" in HTML::Object::DOM::Document method, as follows:

    use HTML::Object::DOM;
    my $parser = HTML::Object::DOM->new;
    my $doc = $parser->parse_data( $some_html_data ) || die( $parser->error );
    my $nodeIterator = $doc->createNodeIterator( $root, $whatToShow, $filter ) ||
        die( $doc->error );

PROPERTIES

expandEntityReferences

Normally this is read-only, but under perl you can set whatever boolean value you want.

Under JavaScript, this is a boolean value indicating if, when discarding an EntityReference its whole sub-tree must be discarded at the same time.

Example:

    use HTML::Object::DOM::NodeFilter qw( :all );
    my $nodeIterator = $doc->createNodeIterator(
        $doc->body,
        SHOW_ELEMENT,
        sub{ return( FILTER_ACCEPT ); },
        # or
        # { acceptNode => sub{ return( FILTER_ACCEPT ); } },
    );
    my $expand = $nodeIterator->expandEntityReferences;

See also Mozilla documentation

filter

Normally this is read-only, but under perl you can set it to a new HTML::Object::DOM::NodeFilter object you want, even after object instantiation.

Returns a HTML::Object::DOM::NodeFilter used to select the relevant nodes.

Example:

    use HTML::Object::DOM::NodeFilter qw( :all );
    my $nodeIterator = $doc->createNodeIterator(
        $doc->body,
        SHOW_ELEMENT,
        sub{ return( FILTER_ACCEPT ); },
        # or
        # { acceptNode => sub{ return( FILTER_ACCEPT ); } },
    );
    my $nodeFilter = $nodeIterator->filter;

See also Mozilla documentation

pointerBeforeReferenceNode

Normally this is read-only, but under perl you can set whatever boolean value you want. Defaults to true.

Returns a boolean flag that indicates whether the NodeIterator is anchored before, the flag being true, or after, the flag being false, the anchor node.

Example:

    use HTML::Object::DOM::NodeFilter qw( :all );
    my $nodeIterator = $doc->createNodeIterator(
        $doc->body,
        SHOW_ELEMENT,
        sub{ return( FILTER_ACCEPT ); },
        # or
        # { acceptNode => sub{ return( FILTER_ACCEPT ); } },
    );
    my $flag = $nodeIterator->pointerBeforeReferenceNode;

See also Mozilla documentation

pos

Read-only.

This is a non-standard property, which returns the 0-based position in the array of the anchor element's children.

You can poll this to know where the iterator is at.

Example:

    use HTML::Object::DOM::NodeFilter qw( :all );
    # You need to first declare $nodeIterator to be able to use it in the callback
    my $nodeIterator;
    $nodeIterator = $doc->createNodeIterator(
        $doc->body,
        SHOW_ELEMENT,
        sub
        {
            say "Current position is: ", $nodeIterator->pos );
            return( $_->getName eq 'div' ? FILTER_ACCEPT : FILTER_SKIP );
        },
    );

referenceNode

Read-only.

Returns the Node to which the iterator is anchored.

Example:

    use HTML::Object::DOM::NodeFilter qw( :all );
    my $nodeIterator = $doc->createNodeIterator(
        $doc->body,
        SHOW_ELEMENT,
        sub{ return( FILTER_ACCEPT ); },
        # or
        # { acceptNode => sub{ return( FILTER_ACCEPT ); } },
    );
    my $node = $nodeIterator->referenceNode;

See also Mozilla documentation

root

Normally this is read-only, but under perl you can set whatever node value you want.

Returns a Node representing the root node as specified when the NodeIterator was created.

Example:

    use HTML::Object::DOM::NodeFilter qw( :all );
    my $nodeIterator = $doc->createNodeIterator(
        $doc->body,
        SHOW_ELEMENT,
        sub{ return( FILTER_ACCEPT ); },
        # or
        # { acceptNode => sub{ return( FILTER_ACCEPT ); } },
    );
    my $root = $nodeIterator->root; # $doc->body in this case

See for more information

whatToShow

Normally this is read-only, but under perl you can set whatever number value you want.

Returns an unsigned long being a bitmask made of constants describing the types of Node that must to be presented. Non-matching nodes are skipped, but their children may be included, if relevant.

Possible constant values (exported by HTML::Object::DOM::NodeFilter) are:

SHOW_ALL (4294967295)

Shows all nodes.

SHOW_ELEMENT (1)

Shows Element nodes.

SHOW_ATTRIBUTE (2)

Shows attribute Attribute nodes. This is meaningful only when creating a NodeIterator with an Attribute node as its root; in this case, it means that the attribute node will appear in the first position of the iteration or traversal. Since attributes are never children of other nodes, they do not appear when traversing over the document tree.

SHOW_TEXT (4)

Shows Text nodes.

Example:

    use HTML::Object::DOM::NodeFilter qw( :all );
    my $nodeIterator = $doc->createNodeIterator(
        $doc->body,
        ( SHOW_ELEMENT | SHOW_COMMENT | SHOW_TEXT ),
        sub{ return( FILTER_ACCEPT ); },
        # or
        # { acceptNode => sub{ return( FILTER_ACCEPT ); } },
    );
    if( ( $nodeIterator->whatToShow & SHOW_ALL ) ||
        ( $nodeIterator->whatToShow & SHOW_COMMENT ) )
    {
        # $nodeIterator will show comments
    }
SHOW_CDATA_SECTION (8)

Will always returns nothing, because there is no support for xml documents.

SHOW_ENTITY_REFERENCE (16)

Legacy, no more used.

SHOW_ENTITY (32)

Legacy, no more used.

SHOW_PROCESSING_INSTRUCTION (64)

Shows ProcessingInstruction nodes.

SHOW_COMMENT (128)

Shows Comment nodes.

SHOW_DOCUMENT (256)

Shows Document nodes

SHOW_DOCUMENT_TYPE (512)

Shows DocumentType nodes

SHOW_DOCUMENT_FRAGMENT (1024)

Shows HTML::Object::DOM::DocumentFragment nodes.

SHOW_NOTATION (2048)

Legacy, no more used.

SHOW_SPACE (4096)

Show Space nodes. This is a non-standard extension under this perl framework.

See for more information

CONSTRUCTOR

new

Provided with a root node, an optional bitwise value representing what to show and an optional filter callback and this will return a new node iterator.

METHODS

detach

This operation is a no-op. It does not do anything. Previously it was telling the web browser engine that the NodeIterator was no more used, but this is now useless.

See also Mozilla documentation

nextNode

Returns the next Node in the document, or undef if there are none.

Example:

    use HTML::Object::DOM::NodeFilter qw( :all );
    my $nodeIterator = $doc->createNodeIterator(
        $doc->body,
        SHOW_ELEMENT,
        sub{ return( FILTER_ACCEPT ); },
        # or
        # { acceptNode => sub{ return( FILTER_ACCEPT ); } },
        0 # false; this optional argument is not used any more
    );
    my $currentNode = $nodeIterator->nextNode(); # returns the next node

See also Mozilla documentation

previousNode

Returns the previous Node in the document, or undef if there are none.

Example:

    use HTML::Object::DOM::NodeFilter qw( :all );
    my $nodeIterator = $doc->createNodeIterator(
        $doc->body,
        SHOW_ELEMENT,
        sub{ return( FILTER_ACCEPT ); },
        # or
        # { acceptNode => sub{ return( FILTER_ACCEPT ); } },
        0 # false; this optional argument is not used any more
    );
    my $currentNode = $nodeIterator->nextNode(); # returns the next node
    my $previousNode = $nodeIterator->previousNode(); # same result, since we backtracked to the previous node

See also Mozilla documentation

AUTHOR

Jacques Deguest <jack@deguest.jp>

SEE ALSO

Mozilla documentation, StackOverflow topic on NodeIterator, W3C specifications

COPYRIGHT & LICENSE

Copyright(c) 2021 DEGUEST Pte. Ltd.

All rights reserved

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.