The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

HTML::Object::DOM::NodeFilter - HTML Object DOM Node Filter

SYNOPSIS

    use HTML::Object::DOM::NodeFilter;
    my $filter = HTML::Object::DOM::NodeFilter->new || 
        die( HTML::Object::DOM::NodeFilter->error, "\n" );

VERSION

    v0.2.0

DESCRIPTION

A NodeFilter interface represents an object used to filter the nodes in a HTML::Object::DOM::NodeIterator or HTML::Object::DOM::::TreeWalker. A NodeFilter knows nothing about the document or traversing nodes; it only knows how to evaluate a single node against the provided filter.

PROPERTIES

There are no properties.

METHODS

acceptNode

Returns an unsigned short that will be used to tell if a given Node must be accepted or not by the HTML::Object::DOM::NodeIterator or HTML::Object::DOM::TreeWalker iteration algorithm.

This method is expected to be written by the user of a NodeFilter. Possible return values are:

FILTER_ACCEPT

Value returned by the "acceptNode" method when a node should be accepted.

FILTER_REJECT

Value to be returned by the "acceptNode" method when a node should be rejected. For HTML::Object::DOM::TreeWalker, child nodes are also rejected.

For NodeIterator, this flag is synonymous with FILTER_SKIP.

FILTER_SKIP

Value to be returned by "acceptNode" for nodes to be skipped by the HTML::Object::DOM::NodeIterator or HTML::Object::DOM::TreeWalker object.

The children of skipped nodes are still considered. This is treated as "skip this node but not its children".

Example:

    use HTML::Object::DOM::NodeFilter qw( :all );
    my $nodeIterator = $doc->createNodeIterator(
        # Node to use as root
        $doc->getElementById('someId'),

        # Only consider nodes that are text nodes (nodeType 3)
        SHOW_TEXT,

        # Object containing the sub to use for the acceptNode method
        # of the NodeFilter
        { acceptNode => sub
            {
                my $node = shift( @_ ); # also available as $_
                # Logic to determine whether to accept, reject or skip node
                # In this case, only accept nodes that have content other than whitespace
                if( $node->data !~ /^\s*$/ )
                {
                    return( FILTER_ACCEPT );
                }
            }
        },
        0 # false
    );

    # Show the content of every non-empty text node that is a child of root
    my $node;
    while( ( $node = $nodeIterator->nextNode() ) )
    {
        say( $node->data );
    }

See also Mozilla documentation

CONSTANTS

SHOW_ALL (4294967295)

Shows all nodes.

SHOW_ELEMENT (1)

Shows Element nodes.

SHOW_ATTRIBUTE (2)

Shows attribute Attribute nodes. This is meaningful only when creating a NodeIterator with an Attribute node as its root; in this case, it means that the attribute node will appear in the first position of the iteration or traversal. Since attributes are never children of other nodes, they do not appear when traversing over the document tree.

SHOW_TEXT (4)

Shows Text nodes.

Example:

    use HTML::Object::DOM::NodeFilter qw( :all );
    my $nodeIterator = $doc->createNodeIterator(
        $doc->body,
        SHOW_ELEMENT | SHOW_COMMENT | SHOW_TEXT,
        { acceptNode => sub{ return( FILTER_ACCEPT ); } },
        0 # false
    );
    if( ( $nodeIterator->whatToShow & SHOW_ALL ) ||
        ( $nodeIterator->whatToShow & SHOW_COMMENT ) )
    {
        # $nodeIterator will show comments
    }
SHOW_CDATA_SECTION (8)

Will always returns nothing, because there is no support for xml documents.

SHOW_ENTITY_REFERENCE (16)

Legacy, no more used.

SHOW_ENTITY (32)

Legacy, no more used.

SHOW_PROCESSING_INSTRUCTION (64)

Shows ProcessingInstruction nodes.

SHOW_COMMENT (128)

Shows Comment nodes.

SHOW_DOCUMENT (256)

Shows Document nodes

SHOW_DOCUMENT_TYPE (512)

Shows DocumentType nodes

SHOW_DOCUMENT_FRAGMENT (1024)

Shows HTML::Object::DOM::DocumentFragment nodes.

SHOW_NOTATION (2048)

Legacy, no more used.

SHOW_SPACE (4096)

Show Space nodes. This is a non-standard extension under this perl framework.

And for the callback control:

FILTER_ACCEPT (1)
FILTER_REJECT (2)
FILTER_SKIP (3)

AUTHOR

Jacques Deguest <jack@deguest.jp>

SEE ALSO

Mozilla documentation, W3C specifications

COPYRIGHT & LICENSE

Copyright(c) 2022 DEGUEST Pte. Ltd.

All rights reserved

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.