NAME

HTML5::DOM - Super fast html5 DOM library with css selectors (based on Modest/MyHTML)

SYNOPSIS

 use warnings;
 use strict;
 use HTML5::DOM;

 # create parser object
 my $parser = HTML5::DOM->new;

 # parse some html
 my $tree = $parser->parse('
  <label>Some list of OS:</label>
  <ul class="list" data-what="os" title="OS list">
     <li>UNIX</li>
     <li>Linux</li>
     <!-- comment -->
     <li>OSX</li>
     <li>Windows</li>
     <li>FreeBSD</li>
  </ul>
 ');

 # find one element by CSS selector
 my $ul = $tree->at('ul.list');

 # prints tag
 print $ul->tag."\n"; # ul

 # check if <ul> has class list
 print "<ul> has class .list\n" if ($ul->classList->has('list'));

 # add some class
 $ul->classList->add('os-list');

 # prints <ul> classes
 print $ul->className."\n"; # list os-list

 # prints <ul> attribute title
 print $ul->attr("title")."\n"; # OS list

 # changing <ul> attribute title
 $ul->attr("title", "OS names list");

 # find all os names
 $ul->find('li')->each(sub {
  my ($node, $index) = @_;
  print "OS #$index: ".$node->text."\n";
 });

 # we can use precompiled selectors
 my $css_parser = HTML5::DOM::CSS->new;
 my $selector = $css_parser->parseSelector('li');

 # remove OSX from OS
 $ul->find($selector)->[2]->remove();

 # serialize tree
 print $tree->html."\n";
 
 # TODO: more examples in SYNOPSIS
 # But you can explore API documentation.
 # My lib have simple API, which is intuitively familiar to anyone who used the DOM.

DESCRIPTION

HTML5::DOM is a fast HTML5 parser and DOM manipulatin library with CSS4 selectors, fully conformant with the HTML5 specification.

It based on https://github.com/lexborisov/Modest as selector engine and https://github.com/lexborisov/myhtml as HTML5 parser.

Key features

  • Really fast HTML parsing.

  • Supports parsing by chunks.

  • Fully conformant with the HTML5 specification.

  • Fast CSS4 selectors.

  • Any manipulations using DOM-like API.

  • Auto-detect input encoding.

  • Fully integration in perl and memory management. You don't care about "free" or "destroy".

  • Supports async parsing, with optional event-loop intergration.

  • Perl utf8-enabled strings supports (See "WORK WITH UTF8" for details.)

HTML5::DOM

HTML5 parser object.

new

 use warnings;
 use strict;
 use HTML5::DOM;
 
 my $parser;
 
 # with default options
 $parser = HTML5::DOM->new;
 
 # or override some options, if you need
 $parser = HTML5::DOM->new({
    threads                 => 0,
    ignore_whitespace       => 0, 
    ignore_doctype          => 0, 
    scripts                 => 0, 
    encoding                => "auto", 
    default_encoding        => "UTF-8", 
    encoding_use_meta       => 1, 
    encoding_use_bom        => 1, 
    encoding_prescan_limit  => 1024
 });

Creates new parser object with options. See "PARSER OPTIONS" for details.

parse

 use warnings;
 use strict;
 use HTML5::DOM;
 
 my $parser = HTML5::DOM->new;
 
 my $html = '<div>Hello world!</div>';
 
 my $tree;
 
 # parsing with options defined in HTML5::DOM->new
 $tree = $parser->parse($html);
 
 # parsing with custom options (extends options defined in HTML5::DOM->new)
 $tree = $parser->parse($html, {
     scripts     => 0, 
 });

Parse html string and return HTML5::DOM::Tree object.

parseChunkStart

 use warnings;
 use strict;
 use HTML5::DOM;
 
 my $parser = HTML5::DOM->new;
 
 # start chunked parsing with options defined in HTML5::DOM->new
 # call parseChunkStart without options is useless, 
 # because first call of parseChunk automatically call parseChunkStart. 
 $parser->parseChunkStart();
 
 # start chunked parsing with custom options (extends options defined in HTML5::DOM->new)
 $parser->parseChunkStart({
    scripts     => 0, 
 });

Init chunked parsing. See "PARSER OPTIONS" for details.

parseChunk

 use warnings;
 use strict;
 use HTML5::DOM;
 
 my $parser = HTML5::DOM->new;
 
 $parser->parseChunkStart()->parseChunk('<')->parseChunk('di')->parseChunk('v>');

Parse chunk of html stream.

parseChunkTree

 use warnings;
 use strict;
 use HTML5::DOM;
 
 my $parser = HTML5::DOM->new;
 
 # start some chunked parsing
 $parser->parseChunk('<')->parseChunk('di')->parseChunk('v>');
 
 # get current tree
 my $tree = $parser->parseChunkTree;
 
 print $tree->html."\n"; # <html><head></head><body><div></div></body></html>
 
 # more parse html
 $parser->parseChunk('<div class="red">red div?</div>');
 
 print $tree->html."\n"; # <html><head></head><body><div><div class="red">red div?</div></div></body></html>
 
 # end parsing
 $parser->parseChunkEnd();
 
 print $tree->html."\n"; # <html><head></head><body><div><div class="red">red div?</div></div></body></html>

Return current HTML5::DOM::Tree object (live result of all calls parseChunk).

parseChunkEnd

 use warnings;
 use strict;
 use HTML5::DOM;
 
 my $parser = HTML5::DOM->new;
 
 # start some chunked parsing
 $parser->parseChunk('<')->parseChunk('di')->parseChunk('v>');
 
 # end parsing and get tree
 my $tree = $parser->parseChunkEnd();
 
 print $tree->html; # <html><head></head><body><div></div></body></html>

Completes chunked parsing and return HTML5::DOM::Tree object.

parseAsync

Parsing html in background thread. Can use with different ways:

1. Manual wait parsing completion when you need.

 use warnings;
 use strict;
 use HTML5::DOM;
 
 my $parser = HTML5::DOM->new;
 
 my $html = '<div>Hello world!</div>';
 
 my $async;
 
 # start async parsing
 $async = $parser->parseAsync($html);
 
 # or with options
 $async = $parser->parseAsync($html, { scripts => 0 });
 
 # ...do some work...
 
 # wait for parsing done
 my $tree = $async->wait;
 
 # work with tree
 print $tree->html;

$async->wait returns HTML5::DOM::AsyncResult object.

2. Non-blocking check for parsing completion.

 use warnings;
 use strict;
 use HTML5::DOM;
 
 my $parser = HTML5::DOM->new;
 
 my $html = '<div>Hello world!</div>';
 
 my $tree;
 my $async;
 
 # start async parsing
 $async = $parser->parseAsync($html);
 
 # or with options
 $async = $parser->parseAsync($html, { scripts => 0 });
 
 while (!$async->parsed) {
     # do some work
 }
 $tree = $async->tree; # HTML5::DOM::Tree
 # work with $tree
 print $tree->root->at('div')->text."\n"; # Hello world!
 
 # or another way
 
 # start async parsing
 $async = $parser->parseAsync($html);
 
 # or with options
 $async = $parser->parseAsync($html, { scripts => 0 });
 
 while (!($tree = $async->tree)) {
     # do some work
 }
 # work with $tree
 print $tree->root->at('div')->text."\n"; # Hello world!

$async->parsed returns 1 if parsing done. Else returns 0.

$async->tree returns HTML5::DOM::Tree object if parsing done. Else returns undef.

3. Intergation with EV

Required packages (only if you want use event loop):

 use warnings;
 use strict;
 use EV;
 use HTML5::DOM;
 
 my $parser = HTML5::DOM->new;
 my $html = '<div>Hello world!</div>';
 
 my $custom_options = { scripts => 0 };
 
 $parser->parseAsync($html, $custom_options, sub {
     my $tree = shift;
     # work with $tree
     print $tree->root->at('div')->text."\n"; # Hello world!
 });
 
 # do some work
 
 EV::loop;

Function returns HTML5::DOM::AsyncResult object.

$tree in callback is a HTML5::DOM::Tree object.

4. Intergation with custom event-loop (example with AnyEvent loop)

 use warnings;
 use strict;
 use AnyEvent;
 use AnyEvent::Util;
 use HTML5::DOM;
 
 my $parser = HTML5::DOM->new;
 my $html = '<div>Hello world!</div>';
 
 my $custom_options = { scripts => 0 };
 
 # create pipe
 my ($r, $w) = AnyEvent::Util::portable_pipe();
 AnyEvent::fh_unblock $r;
 
 # fd for parseAsync communications
 my $write_fd = fileno($w);
 
 # after parsing complete module writes to $write_fd:
 # value "1" - if success
 # value "0" - if error
 my $async = $parser->parseAsync($html, $custom_options, $write_fd);
 
 # watch for value
 my $async_watcher;
 $async_watcher = AE::io $r, 0, sub {
     <$r>; # read "1" or "0"
     $async_watcher = undef; # destroy watcher
     
     # work with $tree
     my $tree = $async->wait;
     print $tree->root->at('div')->text."\n"; # Hello world!
 };
 
 # ...do some work...
 
 AE::cv->recv;

$tree in callback is a HTML5::DOM::Tree object.

HTML5::DOM::Tree

DOM tree object.

createElement

 # create new node with tag "div"
 my $node = $tree->createElement("div");
 
 # create new node with tag "g" with namespace "svg"
 my $node = $tree->createElement("div", "svg");

Create new HTML5::DOM::Element with specified tag and namespace.

createComment

 # create new comment
 my $node = $tree->createComment(" ololo ");
 
 print $node->html; # <!-- ololo -->

Create new HTML5::DOM::Comment with specified value.

createTextNode

 # create new text node
 my $node = $tree->createTextNode("psh psh ololo i am driver of ufo >>>");
 
 print $node->html; # psh psh ololo i am driver of ufo &gt;&gt;&gt;

Create new HTML5::DOM::Text with specified value.

parseFragment

 my $fragment = $tree->parseFragment($html);
 my $fragment = $tree->parseFragment($html, $context);
 my $fragment = $tree->parseFragment($html, $context, $context_ns);
 my $fragment = $tree->parseFragment($html, $context, $context_ns, $options);

Parse fragment html and create new HTML5::DOM::Fragment. For more details about fragments: https://html.spec.whatwg.org/multipage/parsing.html#parsing-html-fragments

  • $html - html fragment string

  • $context - context tag name, default div

  • $context_ns - context tag namespace, default html

  • $options - parser options

    See "PARSER OPTIONS" for details.

 # simple create new fragment
 my $node = $tree->parseFragment("some <b>bold</b> and <i>italic</i> text");

 # create new fragment node with custom context tag/namespace and options
 my $node = $tree->parseFragment("some <b>bold</b> and <i>italic</i> text", "div", "html", {
    # some options override
    encoding => "windows-1251"
 });
 
 print $node->html; # some <b>bold</b> and <i>italic</i> text

document

 my $node = $tree->document;

Return HTML5::DOM::Document node of current tree;

root

 my $node = $tree->root;

Return root node of current tree. (always <html>)

 my $node = $tree->head;

Return <head> node of current tree.

body

 my $node = $tree->body;

Return <body> node of current tree.

at

querySelector

 my $node = $tree->at($selector);
 my $node = $tree->querySelector($selector); # alias

Find one element node in tree using CSS Selectors Level 4

Return node, or undef if not find.

 my $tree = HTML5::DOM->new->parse('<div class="red">red</div><div class="blue">blue</div>')
 my $node = $tree->at('body > div.red');
 print $node->html; # <div class="red">red</div>

find

querySelectorAll

 my $collection = $tree->find($selector);
 my $collection = $tree->querySelectorAll($selector); # alias

Find all element nodes in tree using CSS Selectors Level 4

Return HTML5::DOM::Collection.

 my $tree = HTML5::DOM->new->parse('<div class="red">red</div><div class="blue">blue</div>')
 my $collection = $tree->at('body > div.red, body > div.blue');
 print $collection->[0]->html; # <div class="red">red</div>
 print $collection->[1]->html; # <div class="red">blue</div>

findId

getElementById

 my $collection = $tree->findId($tag);
 my $collection = $tree->getElementById($tag); # alias

Find element node with specified id.

Return HTML5::DOM::Node or undef.

 my $tree = HTML5::DOM->new->parse('<div class="red">red</div><div class="blue" id="test">blue</div>')
 my $node = $tree->findId('test');
 print $node->html; # <div class="blue" id="test">blue</div>

findTag

getElementsByTagName

 my $collection = $tree->findTag($tag);
 my $collection = $tree->getElementsByTagName($tag); # alias

Find all element nodes in tree with specified tag name.

Return HTML5::DOM::Collection.

 my $tree = HTML5::DOM->new->parse('<div class="red">red</div><div class="blue">blue</div>')
 my $collection = $tree->findTag('div');
 print $collection->[0]->html; # <div class="red">red</div>
 print $collection->[1]->html; # <div class="red">blue</div>

findClass

getElementsByClassName

 my $collection = $tree->findClass($class);
 my $collection = $tree->getElementsByClassName($class); # alias

Find all element nodes in tree with specified class name. This is more fast equivalent to [class~="value"] selector.

Return HTML5::DOM::Collection.

 my $tree = HTML5::DOM->new
    ->parse('<div class="red color">red</div><div class="blue color">blue</div>');
 my $collection = $tree->findClass('color');
 print $collection->[0]->html; # <div class="red color">red</div>
 print $collection->[1]->html; # <div class="red color">blue</div>

findAttr

getElementByAttribute

 # Find all elements with attribute
 my $collection = $tree->findAttr($attribute);
 my $collection = $tree->getElementByAttribute($attribute); # alias
 
 # Find all elements with attribute and mathcing value
 my $collection = $tree->findAttr($attribute, $value, $case = 0, $cmp = '=');
 my $collection = $tree->getElementByAttribute($attribute, $value, $case = 0, $cmp = '='); # alias

Find all element nodes in tree with specified attribute and optional matching value.

Return HTML5::DOM::Collection.

 my $tree = HTML5::DOM->new
    ->parse('<div class="red color">red</div><div class="blue color">blue</div>');
 my $collection = $tree->findAttr('class', 'CoLoR', 1, '~');
 print $collection->[0]->html; # <div class="red color">red</div>
 print $collection->[1]->html; # <div class="red color">blue</div>
 

CSS selector analogs:

 # [$attribute=$value]
 my $collection = $tree->findAttr($attribute, $value, 0, '=');
 
 # [$attribute=$value i]
 my $collection = $tree->findAttr($attribute, $value, 1, '=');
 
 # [$attribute~=$value]
 my $collection = $tree->findAttr($attribute, $value, 0, '~');
 
 # [$attribute|=$value]
 my $collection = $tree->findAttr($attribute, $value, 0, '|');
 
 # [$attribute*=$value]
 my $collection = $tree->findAttr($attribute, $value, 0, '*');

 # [$attribute^=$value]
 my $collection = $tree->findAttr($attribute, $value, 0, '^');

 # [$attribute$=$value]
 my $collection = $tree->findAttr($attribute, $value, 0, '$');

encoding

encodingId

 print "encoding: ".$tree->encoding."\n"; # UTF-8
 print "encodingId: ".$tree->encodingId."\n"; # 0

Return current tree encoding. See "ENCODINGS" for details.

tag2id

 print "tag id: ".HTML5::DOM->TAG_A."\n"; # tag id: 4
 print "tag id: ".$tree->tag2id("a")."\n"; # tag id: 4

Convert tag name to id. Return 0 (HTML5::DOM->TAG__UNDEF), if tag not exists in tree. See "TAGS" for tag constants list.

id2tag

 print "tag name: ".$tree->id2tag(4)."\n"; # tag name: a
 print "tag name: ".$tree->id2tag(HTML5::DOM->TAG_A)."\n"; # tag name: a

Convert tag id to name. Return undef, if tag id not exists in tree. See "TAGS" for tag constants list.

namespace2id

 print "ns id: ".HTML5::DOM->NS_HTML."\n"; # ns id: 1
 print "ns id: ".$tree->namespace2id("html")."\n"; # ns id: 1

Convert namespace name to id. Return 0 (HTML5::DOM->NS_UNDEF), if namespace not exists in tree. See "NAMESPACES" for namespace constants list.

id2namespace

 print "ns name: ".$tree->id2namespace(1)."\n"; # ns name: html
 print "ns name: ".$tree->id2namespace(HTML5::DOM->NS_HTML)."\n"; # ns name: html

Convert namespace id to name. Return undef, if namespace id not exists. See "NAMESPACES" for namespace constants list.

parser

 my $parser = $tree->parser;

Return parent HTML5::DOM.

utf8

As getter - get 1 if all methods returns all strings with utf8 flag.

Example with utf8:

 use warnings;
 use strict;
 use HTML5::DOM;
 use utf8;
 
 my $tree = HTML5::DOM->new->parse("<b>тест</b>");
 my $is_utf8_enabled = $tree->utf8;
 print "is_utf8_enabled=".($tree ? "true" : "false")."\n"; # true

Or example with bytes:

 use warnings;
 use strict;
 use HTML5::DOM;
 
 my $tree = HTML5::DOM->new->parse("<b>тест</b>");
 my $is_utf8_enabled = $tree->utf8;
 print "is_utf8_enabled=".($tree ? "true" : "false")."\n"; # false

As setter - enable or disable utf8 flag on all returned strings.

 use warnings;
 use strict;
 use HTML5::DOM;
 use utf8;
 
 my $tree = HTML5::DOM->new->parse("<b>тест</b>");
 
 print "is_utf8_enabled=".($tree->utf8 ? "true" : "false")."\n"; # true
 print length($tree->at('b')->text)." chars\n"; # 4 chars
 
 $selector->utf8(0);
 
 print "is_utf8_enabled=".($tree->utf8 ? "true" : "false")."\n"; # false
 print length($tree->at('b')->text)." bytes\n"; # 8 bytes

HTML5::DOM::Node

DOM node object.

tag

nodeName

 my $tag_name = $node->tag;
 my $tag_name = $node->nodeName; # uppercase
 my $tag_name = $node->tagName;  # uppercase

Return node tag name (eg. div or span)

 $node->tag($tag);
 $node->nodeName($tag); # alias
 $node->tagName($tag);  # alias

Set new node tag name. Allow only for HTML5::DOM::Element nodes.

 print $node->html; # <div></div>
 $node->tag('span');
 print $node->html; # <span></span>
 print $node->tag; # span
 print $node->tag; # SPAN

tagId

 my $tag_id = $node->tagId;

Return node tag id. See "TAGS" for tag constants list.

 $node->tagId($tag_id);

Set new node tag id. Allow only for HTML5::DOM::Element nodes.

 print $node->html; # <div></div>
 $node->tagId(HTML5::DOM->TAG_SPAN);
 print $node->html; # <span></span>
 print $node->tagId; # 117

namespace

 my $tag_ns = $node->namespace;

Return node namespace (eg. html or svg)

 $node->namespace($namespace);

Set new node namespace name. Allow only for HTML5::DOM::Element nodes.

 print $node->namespace; # html
 $node->namespace('svg');
 print $node->namespace; # svg

namespaceId

 my $tag_ns_id = $node->namespaceId;

Return node namespace id. See "NAMESPACES" for tag constants list.

 $node->namespaceId($tag_id);

Set new node namespace by id. Allow only for HTML5::DOM::Element nodes.

 print $node->namespace; # html
 $node->namespaceId(HTML5::DOM->NS_SVG);
 print $node->namespaceId; # 3
 print $node->namespace; # svg

tree

 my $tree = $node->tree;
 

Return parent HTML5::DOM::Tree.

nodeType

 my $type = $node->nodeType;
 

Return node type. All types:

 HTML5::DOM->ELEMENT_NODE                   => 1, 
 HTML5::DOM->ATTRIBUTE_NODE                 => 2,   # not supported
 HTML5::DOM->TEXT_NODE                      => 3, 
 HTML5::DOM->CDATA_SECTION_NODE             => 4,   # not supported
 HTML5::DOM->ENTITY_REFERENCE_NODE          => 5,   # not supported
 HTML5::DOM->ENTITY_NODE                    => 6,   # not supported
 HTML5::DOM->PROCESSING_INSTRUCTION_NODE    => 7,   # not supported
 HTML5::DOM->COMMENT_NODE                   => 8, 
 HTML5::DOM->DOCUMENT_NODE                  => 9, 
 HTML5::DOM->DOCUMENT_TYPE_NODE             => 10, 
 HTML5::DOM->DOCUMENT_FRAGMENT_NODE         => 11, 
 HTML5::DOM->NOTATION_NODE                  => 12   # not supported

Compatible with: https://developer.mozilla.org/ru/docs/Web/API/Node/nodeType

next

nextElementSibling

 my $node2 = $node->next;
 my $node2 = $node->nextElementSibling; # alias

Return next sibling element node

 my $tree = HTML5::DOM->new->parse('
    <ul>
        <li>Linux</li>
        <!-- comment -->
        <li>OSX</li>
        <li>Windows</li>
    </ul>
 ');
 my $li = $tree->at('ul li');
 print $li->text;               # Linux
 print $li->next->text;         # OSX
 print $li->next->next->text;   # Windows

prev

previousElementSibling

 my $node2 = $node->prev;
 my $node2 = $node->previousElementSibling; # alias

Return previous sibling element node

 my $tree = HTML5::DOM->new->parse('
    <ul>
        <li>Linux</li>
        <!-- comment -->
        <li>OSX</li>
        <li class="win">Windows</li>
    </ul>
 ');
 my $li = $tree->at('ul li.win');
 print $li->text;               # Windows
 print $li->prev->text;         # OSX
 print $li->prev->prev->text;   # Linux

nextNode

nextSibling

 my $node2 = $node->nextNode;
 my $node2 = $node->nextSibling; # alias

Return next sibling node

 my $tree = HTML5::DOM->new->parse('
    <ul>
        <li>Linux</li>
        <!-- comment -->
        <li>OSX</li>
        <li>Windows</li>
    </ul>
 ');
 my $li = $tree->at('ul li');
 print $li->text;                       # Linux
 print $li->nextNode->text;             # <!-- comment -->
 print $li->nextNode->nextNode->text;   # OSX

prevNode

previousSibling

 my $node2 = $node->prevNode;
 my $node2 = $node->previousSibling; # alias

Return previous sibling node

 my $tree = HTML5::DOM->new->parse('
    <ul>
        <li>Linux</li>
        <!-- comment -->
        <li>OSX</li>
        <li class="win">Windows</li>
    </ul>
 ');
 my $li = $tree->at('ul li.win');
 print $li->text;                       # Windows
 print $li->prevNode->text;             # OSX
 print $li->prevNode->prevNode->text;   # <!-- comment -->

first

firstElementChild

 my $node2 = $node->first;
 my $node2 = $node->firstElementChild; # alias

Return first children element

 my $tree = HTML5::DOM->new->parse('
    <ul>
        <!-- comment -->
        <li>Linux</li>
        <li>OSX</li>
        <li class="win">Windows</li>
    </ul>
 ');
 my $ul = $tree->at('ul');
 print $ul->first->text; # Linux

last

lastElementChild

 my $node2 = $node->last;
 my $node2 = $node->lastElementChild; # alias

Return last children element

 my $tree = HTML5::DOM->new->parse('
    <ul>
        <li>Linux</li>
        <li>OSX</li>
        <li class="win">Windows</li>
        <!-- comment -->
    </ul>
 ');
 my $ul = $tree->at('ul');
 print $ul->last->text; # Windows

firstNode

firstChild

 my $node2 = $node->firstNode;
 my $node2 = $node->firstChild; # alias

Return first children node

 my $tree = HTML5::DOM->new->parse('
    <ul>
        <!-- comment -->
        <li>Linux</li>
        <li>OSX</li>
        <li class="win">Windows</li>
    </ul>
 ');
 my $ul = $tree->at('ul');
 print $ul->firstNode->html; # <!-- comment -->

lastNode

lastChild

 my $node2 = $node->lastNode;
 my $node2 = $node->lastChild; # alias

Return last children node

 my $tree = HTML5::DOM->new->parse('
    <ul>
        <li>Linux</li>
        <li>OSX</li>
        <li class="win">Windows</li>
        <!-- comment -->
    </ul>
 ');
 my $ul = $tree->at('ul');
 print $ul->lastNode->html; # <!-- comment -->

html

Universal html serialization and fragment parsing acessor, for single human-friendly api.

 my $html = $node->html();
 my $node = $node->html($new_html);
 my $tree = HTML5::DOM->new->parse('<div id="test">some   text <b>bold</b></div>');
 
 # get text content for element
 my $node = $tree->at('#test');
 print $node->html;                     # <div id="test">some   text <b>bold</b></div>
 $comment->html('<b>new</b>');
 print $comment->html;                  # <div id="test"><b>new</b></div>
 
 my $comment = $tree->createComment(" comment text ");
 print $comment->html;                  # <!-- comment text -->
 $comment->html(' new comment text ');
 print $comment->html;                  # <!-- new comment text -->
 
 my $text_node = $tree->createTextNode("plain text >");
 print $text_node->html;                # plain text &gt;
 $text_node->html('new>plain>text');
 print $text_node->html;                # new&gt;plain&gt;text

innerHTML

outerHTML

  • HTML serialization of the node's descendants.

     my $html = $node->html;
     my $html = $node->outerHTML;

    Example:

     my $tree = HTML5::DOM->new->parse('<div id="test">some <b>bold</b> test</div>');
     print $tree->getElementById('test')->outerHTML;   # <div id="test">some <b>bold</b> test</div>
     print $tree->createComment(' test ')->outerHTML;  # <!-- test -->
     print $tree->createTextNode('test')->outerHTML;   # test
  • HTML serialization of the node and its descendants.

     # serialize descendants, without node
     my $html = $node->innerHTML;

    Example:

     my $tree = HTML5::DOM->new->parse('<div id="test">some <b>bold</b> test</div>');
     print $tree->getElementById('test')->innerHTML;   # some <b>bold</b> test
     print $tree->createComment(' test ')->innerHTML;  # <!-- test -->
     print $tree->createTextNode('test')->innerHTML;   # test
  • Removes all of the element's descendants and replaces them with nodes constructed by parsing the HTML given in the string $new_html.

     # parse fragment and replace child nodes with it
     my $html = $node->html($new_html);
     my $html = $node->innerHTML($new_html);

    Example:

     my $tree = HTML5::DOM->new->parse('<div id="test">some <b>bold</b> test</div>');
     print $tree->at('#test')->innerHTML('<i>italic</i>');
     print $tree->body->innerHTML;  # <div id="test"><i>italic</i></div>
  • HTML serialization of entire document

      my $html = $tree->document->html;
      my $html = $tree->document->outerHTML;

    Example:

      my $tree = HTML5::DOM->new->parse('<!DOCTYPE html><div id="test">some <b>bold</b> test</div>');
      print $tree->document->outerHTML;   # <!DOCTYPE html><html><head></head><body><div id="test">some <b>bold</b> test</div></body></html>
  • Replaces the element and all of its descendants with a new DOM tree constructed by parsing the specified $new_html.

     # parse fragment and node in parent node childs with it
     my $html = $node->outerHTML($new_html);

    Example:

     my $tree = HTML5::DOM->new->parse('<div id="test">some <b>bold</b> test</div>');
     print $tree->at('#test')->outerHTML('<i>italic</i>');
     print $tree->body->innerHTML;  # <i>italic</i>

See, for more info:

https://developer.mozilla.org/en-US/docs/Web/API/Element/innerHTML

https://developer.mozilla.org/en-US/docs/Web/API/Element/outerHTML

text

Universal text acessor, for single human-friendly api.

 my $text = $node->text();
 my $node = $node->text($new_text);
 my $tree = HTML5::DOM->new->parse('<div id="test">some   text <b>bold</b></div>');
 
 # get text content for element
 my $node = $tree->at('#test');
 print $node->text;                     # some   text bold
 $comment->text('<new node content>');
 print $comment->html;                  # &lt;new node conten&gt;
 
 my $comment = $tree->createComment("comment text");
 print $comment->text;                  # comment text
 $comment->text(' new comment text ');
 print $comment->html;                  # <!-- new comment text -->
 
 my $text_node = $tree->createTextNode("plain text");
 print $text_node->text;                # plain text
 $text_node->text('new>plain>text');
 print $text_node->html;                # new&gt;plain&gt;text

innerText

outerText

textContent

  • Represents the "rendered" text content of a node and its descendants. Using default CSS "display" property for tags based on Firefox user-agent style.

    Only works for elements, for other nodes return undef.

     my $text = $node->innerText;
     my $text = $node->outerText; # alias

    Example:

     my $tree = HTML5::DOM->new->parse('
        <div id="test">
            some       
            <b>      bold     </b>       
            test
            <script>alert()</script>
        </div>
     ');
     print $tree->body->innerText; # some bold test

    See, for more info: https://html.spec.whatwg.org/multipage/dom.html#the-innertext-idl-attribute

  • Removes all of its children and replaces them with a text nodes and <br> with the given value. Only works for elements, for other nodes throws exception.

    • All new line chars (\r\n, \r, \n) replaces to <br />

    • All other text content replaces to text nodes

     my $node = $node->innerText($text);

    Example:

     my $tree = HTML5::DOM->new->parse('<div id="test">some text <b>bold</b></div>');
     $tree->at('#test')->innerText("some\nnew\ntext >");
     print $tree->at('#test')->html;    # <div id="test">some<br />new<br />text &gt;</div>

    See, for more info: https://html.spec.whatwg.org/multipage/dom.html#the-innertext-idl-attribute

  • Removes the current node and replaces it with the given text. Only works for elements, for other nodes throws exception.

    • All new line chars (\r\n, \r, \n) replaces to <br />

    • All other text content replaces to text nodes

    • Similar to innerText($text), but removes current node

     my $node = $node->outerText($text);

    Example:

     my $tree = HTML5::DOM->new->parse('<div id="test">some text <b>bold</b></div>');
     $tree->at('#test')->outerText("some\nnew\ntext >");
     print $tree->body->html;   # <body>some<br />new<br />text &gt;</body>

    See, for more info: https://developer.mozilla.org/en-US/docs/Web/API/HTMLElement/outerText

  • Represents the text content of a node and its descendants.

    Only works for elements, for other nodes return undef.

     my $text = $node->text;
     my $text = $node->textContent; # alias

    Example:

     my $tree = HTML5::DOM->new->parse('<b>    test      </b><script>alert()</script>');
     print $tree->body->text; #     test      alert()

    See, for more info: https://developer.mozilla.org/en-US/docs/Web/API/Node/textContent

  • Removes all of its children and replaces them with a single text node with the given value.

     my $node = $node->text($new_text);
     my $node = $node->textContent($new_text);

    Example:

     my $tree = HTML5::DOM->new->parse('<div id="test">some <b>bold</b> test</div>');
     print $tree->at('#test')->text('<bla bla bla>');
     print $tree->at('#test')->html;  # <div id="test">&lt;bla bla bla&gt;</div>

    See, for more info: https://developer.mozilla.org/en-US/docs/Web/API/Node/textContent

nodeHtml

 my $html = $node->nodeHtml();

Serialize to html, without descendants and closing tag.

 my $tree = HTML5::DOM->new->parse('<div id="test">some <b>bold</b> test</div>');
 print $tree->at('#test')->nodeHtml(); # <div id="test">

nodeValue

data

 my $value = $node->nodeValue();
 my $value = $node->data(); # alias

 my $node = $node->nodeValue($new_value);
 my $node = $node->data($new_value); # alias

Get or set value of node. Only works for non-element nodes, such as HTML5::DOM::Element, HTML5::DOM::Element, HTML5::DOM::Element. Return undef for other.

 my $tree = HTML5::DOM->new->parse('');
 my $comment = $tree->createComment("comment text");
 print $comment->nodeValue;                 # comment text
 $comment->nodeValue(' new comment text ');
 print $comment->html;                      # <!-- new comment text -->

isConnected

 my $flag = $node->isConnected;

Return true, if node has parent.

 my $tree = HTML5::DOM->new->parse('
    <div id="test"></div>
 ');
 print $tree->at('#test')->isConnected;             # 1
 print $tree->createElement("div")->isConnected;    # 0

parent

parentElement

 my $node = $node->parent;
 my $node = $node->parentElement; # alias

Return parent node. Return undef, if node detached.

 my $tree = HTML5::DOM->new->parse('
    <div id="test"></div>
 ');
 print $tree->at('#test')->parent->tag; # body

document

ownerDocument

 my $doc = $node->document;
 my $doc = $node->ownerDocument; # alias

Return parent HTML5::DOM::Document.

 my $tree = HTML5::DOM->new->parse('
    <div id="test"></div>
 ');
 print ref($tree->at('#test')->document);   # HTML5::DOM::Document

append

appendChild

 my $node = $node->append($child);
 my $child = $node->appendChild($child); # alias

Append node to child nodes.

append - returned value is the self node, for chain calls

appendChild - returned value is the appended child except when the given child is a HTML5::DOM::Fragment, in which case the empty HTML5::DOM::Fragment is returned.

 my $tree = HTML5::DOM->new->parse('
    <div>some <b>bold</b> text</div>
 ');
 $tree->at('div')
    ->append($tree->createElement('br'))
    ->append($tree->createElement('br'));
 print $tree->at('div')->html; # <div>some <b>bold</b> text<br /><br /></div>

prepend

prependChild

 my $node = $node->prepend($child);
 my $child = $node->prependChild($child); # alias

Prepend node to child nodes.

prepend - returned value is the self node, for chain calls

prependChild - returned value is the prepended child except when the given child is a HTML5::DOM::Fragment, in which case the empty HTML5::DOM::Fragment is returned.

 my $tree = HTML5::DOM->new->parse('
    <div>some <b>bold</b> text</div>
 ');
 $tree->at('div')
    ->prepend($tree->createElement('br'))
    ->prepend($tree->createElement('br'));
 print $tree->at('div')->html; # <div><br /><br />some <b>bold</b> text</div>

replace

replaceChild

 my $old_node = $old_node->replace($new_node);
 my $old_node = $old_node->parent->replaceChild($new_node, $old_node); # alias

Replace node in parent child nodes.

 my $tree = HTML5::DOM->new->parse('
    <div>some <b>bold</b> text</div>
 ');
 my $old = $tree->at('b')->replace($tree->createElement('br'));
 print $old->html;              # <b>bold</b>
 print $tree->at('div')->html;  # <div>some <br /> text</div>

before

insertBefore

 my $node = $node->before($new_node);
 my $new_node = $node->parent->insertBefore($new_node, $node); # alias

Insert new node before current node.

before - returned value is the self node, for chain calls

insertBefore - returned value is the added child except when the given child is a HTML5::DOM::Fragment, in which case the empty HTML5::DOM::Fragment is returned.

 my $tree = HTML5::DOM->new->parse('
    <div>some <b>bold</b> text</div>
 ');
 $tree->at('b')->before($tree->createElement('br'));
 print $tree->at('div')->html; # <div>some <br /><b>bold</b> text</div>

after

insertAfter

 my $node = $node->after($new_node);
 my $new_node = $node->parent->insertAfter($new_node, $node); # alias

Insert new node after current node.

after - returned value is the self node, for chain calls

insertAfter - returned value is the added child except when the given child is a HTML5::DOM::Fragment, in which case the empty HTML5::DOM::Fragment is returned.

 my $tree = HTML5::DOM->new->parse('
    <div>some <b>bold</b> text</div>
 ');
 $tree->at('b')->after($tree->createElement('br'));
 print $tree->at('div')->html; # <div>some <b>bold</b><br /> text</div>

remove

removeChild

 my $node = $node->remove;
 my $node = $node->parent->removeChild($node); # alias

Remove node from parent. Return removed node.

 my $tree = HTML5::DOM->new->parse('
    <div>some <b>bold</b> text</div>
 ');
 print $tree->at('b')->remove->html;    # <b>bold</b>
 print $tree->at('div')->html;          # <div>some  text</div>

clone

cloneNode

 # clone node to current tree
 my $node = $node->clone($deep = 0);
 my $node = $node->cloneNode($deep = 0); # alias

 # clone node to foreign tree
 my $node = $node->clone($deep, $new_tree);
 my $node = $node->cloneNode($deep, $new_tree); # alias

Clone node.

deep = 0 - only specified node, without childs.

deep = 1 - deep copy with all child nodes.

new_tree - destination tree (if need copy to foreign tree)

 my $tree = HTML5::DOM->new->parse('
    <div>some <b>bold</b> text</div>
 ');
 print $tree->at('b')->clone(0)->html; # <b></b>
 print $tree->at('b')->clone(1)->html; # <b>bold</b>

void

 my $flag = $node->void;

Return true if node is void. For more details: http://w3c.github.io/html-reference/syntax.html#void-elements

 print $tree->createElement('br')->void; # 1

selfClosed

 my $flag = $node->selfClosed;

Return true if node self closed.

 print $tree->createElement('br')->selfClosed; # 1

position

 my $position = $node->position;

Return offsets in input buffer.

 print Dumper($node->position);
 # $VAR1 = {'raw_length' => 3, 'raw_begin' => 144, 'element_begin' => 143, 'element_length' => 5}

isSameNode

 my $flag = $node->isSameNode($other_node);

Tests whether two nodes are the same, that is if they reference the same object.

 my $tree = HTML5::DOM->new->parse('
    <ul>
        <li>test</li>
        <li>not test</li>
        <li>test</li>
    </ul>
 ');
 my $li = $tree->find('li');
 print $li->[0]->isSameNode($li->[0]); # 1
 print $li->[0]->isSameNode($li->[1]); # 0
 print $li->[0]->isSameNode($li->[2]); # 0

HTML5::DOM::Element

DOM node object for elements. Inherit all methods from HTML5::DOM::Node.

children

 my $collection = $node->children;

Returns all child elements of current node in HTML5::DOM::Collection.

 my $tree = HTML5::DOM->new->parse('
    <ul>
        <li>Perl</li>
        <!-- comment -->
        <li>PHP</li>
        <li>C++</li>
    </ul>
 ');
 my $collection = $tree->at('ul')->children;
 print $collection->[0]->html; # <li>Perl</li>
 print $collection->[1]->html; # <li>PHP</li>
 print $collection->[2]->html; # <li>C++</li>

childrenNode

childNodes

 my $collection = $node->childrenNode;
 my $collection = $node->childNodes; # alias

Returns all child nodes of current node in HTML5::DOM::Collection.

 my $tree = HTML5::DOM->new->parse('
    <ul>
        <li>Perl</li>
        <!-- comment -->
        <li>PHP</li>
        <li>C++</li>
    </ul>
 ');
 my $collection = $tree->at('ul')->childrenNode;
 print $collection->[0]->html; # <li>Perl</li>
 print $collection->[1]->html; # <!-- comment -->
 print $collection->[2]->html; # <li>PHP</li>
 print $collection->[3]->html; # <li>C++</li>

attr

removeAttr

Universal attributes accessor, for single human-friendly api.

 # attribute get
 my $value = $node->attr($key);
 
 # attribute set
 my $node = $node->attr($key, $value);
 my $node = $node->attr($key => $value);
 
 # attribute remove
 my $node = $node->attr($key, undef);
 my $node = $node->attr($key => undef);
 my $node = $node->removeAttr($key);
 
 # bulk attributes set
 my $node = $node->attr({$key => $value, $key2 => $value2});
 
 # bulk attributes remove
 my $node = $node->attr({$key => undef, $key2 => undef});
 
 # bulk get all attributes in hash
 my $hash = $node->attr;

Example:

 my $tree = HTML5::DOM->new->parse('
    <div id="test" data-test="test value" data-href="#"></div>
 ');
 my $div = $tree->at('#test');
 $div->attr("data-new", "test");
 print $div->attr("data-test");     # test value
 print $div->{"data-test"};         # test value
 print $div->attr->{"data-test"};   # test value
 
 # {id => "test", "data-test" => "test value", "data-href" => "#", "data-new" => "test"}
 print Dumper($div->attr);
 
 $div->removeAttr("data-test");
 
 # {id => "test", "data-href" => "#", "data-new" => "test"}
 print Dumper($div->attr);

attrArray

 my $arr = $node->attrArray;

Get all attributes in array (in tree order).

 my $tree = HTML5::DOM->new->parse('
    <div id="test" data-test="test value" data-href="#"></div>
 ');
 my $div = $tree->at('#test');
 
 # [{key => 'id', value => 'test'}, {key => 'data-test', value => 'test'}, {key => 'data-href', value => '#'}]
 print Dumper($div->attrArray);

getAttribute

 my $value = $node->getAttribute($key);
 my $value = $node->attr($key); # alias

Get attribute value by key.

setAttribute

 my $node = $node->setAttribute($key, $value);
 my $node = $node->attr($key, $value); # alias

Set new value or create new attibute.

removeAttribute

 my $node = $node->removeAttribute($key);
 my $node = $node->removeAttr($key); # alias

Remove attribute.

className

 my $classes = $node->className;
 # alias for
 my $classes = $node->attr("class");

classList

 my $class_list = $node->classList;
 
 # has class
 my $flag = $class_list->has($class_name);
 my $flag = $class_list->contains($class_name);
 
 # add class
 my $class_list = $class_list->add($class_name);
 my $class_list = $class_list->add($class_name, $class_name1, $class_name2, ...);

 # add class
 my $class_list = $class_list->remove($class_name);
 my $class_list = $class_list->remove($class_name, $class_name1, $class_name2, ...);

 # toggle class
 my $state = $class_list->toggle($class_name);
 my $state = $class_list->toggle($class_name, $force_state);

Manipulations with classes. Returns HTML5::DOM::TokenList.

Similar to https://developer.mozilla.org/en-US/docs/Web/API/Element/classList

 my $tree = HTML5::DOM->new->parse('<div class="red">red</div>')
 my $node = $tree->body->at('.red');
 print $node->has('red');                       # 1
 print $node->has('blue');                      # 0
 $node->add('blue', 'red', 'yellow', 'orange');
 print $node->className;                        # red blue yellow orange
 $node->remove('blue', 'orange');
 print $node->className;                        # red yellow
 print $node->toggle('blue');                   # 1
 print $node->className;                        # red yellow blue
 print $node->toggle('blue');                   # 0
 print $node->className;                        # red yellow

at

querySelector

 my $node = $node->at($selector);
 my $node = $node->at($selector, $combinator);
 my $node = $node->querySelector($selector); # alias
 my $node = $node->querySelector($selector, $combinator); # alias

Find one element node in current node descendants using CSS Selectors Level 4

Return node, or undef if not find.

  • $selector - selector query as plain text or precompiled as HTML5::DOM::CSS::Selector or HTML5::DOM::CSS::Selector.

  • $combinator - custom selector combinator, applies to current node

    • >> - descendant selector (default)

    • > - child selector

    • + - adjacent sibling selector

    • ~ - general sibling selector

    • || - column combinator

 my $tree = HTML5::DOM->new->parse('<div class="red">red</div><div class="blue">blue</div>')
 my $node = $tree->body->at('body > div.red');
 print $node->html; # <div class="red">red</div>

find

querySelectorAll

 my $collection = $node->find($selector);
 my $collection = $node->find($selector, $combinator);
 my $collection = $node->querySelectorAll($selector); # alias
 my $collection = $node->querySelectorAll($selector, $combinator); # alias

Find all element nodes in current node descendants using CSS Selectors Level 4

Return HTML5::DOM::Collection.

  • $selector - selector query as plain text or precompiled as HTML5::DOM::CSS::Selector or HTML5::DOM::CSS::Selector.

  • $combinator - custom selector combinator, applies to current node

    • >> - descendant selector (default)

    • > - child selector

    • + - adjacent sibling selector

    • ~ - general sibling selector

    • || - column combinator

 my $tree = HTML5::DOM->new->parse('<div class="red">red</div><div class="blue">blue</div>')
 my $collection = $tree->body->at('body > div.red, body > div.blue');
 print $collection->[0]->html; # <div class="red">red</div>
 print $collection->[1]->html; # <div class="red">blue</div>

findId

getElementById

 my $node = $node->findId($tag);
 my $node = $node->getElementById($tag); # alias

Find element node with specified id in current node descendants.

Return HTML5::DOM::Node or undef.

 my $tree = HTML5::DOM->new->parse('<div class="red">red</div><div class="blue" id="test">blue</div>')
 my $node = $tree->body->findId('test');
 print $node->html; # <div class="blue" id="test">blue</div>

findTag

getElementsByTagName

 my $node = $node->findTag($tag);
 my $node = $node->getElementsByTagName($tag); # alias

Find all element nodes in current node descendants with specified tag name.

Return HTML5::DOM::Collection.

 my $tree = HTML5::DOM->new->parse('<div class="red">red</div><div class="blue">blue</div>')
 my $collection = $tree->body->findTag('div');
 print $collection->[0]->html; # <div class="red">red</div>
 print $collection->[1]->html; # <div class="red">blue</div>

findClass

getElementsByClassName

 my $collection = $node->findClass($class);
 my $collection = $node->getElementsByClassName($class); # alias

Find all element nodes in current node descendants with specified class name. This is more fast equivalent to [class~="value"] selector.

Return HTML5::DOM::Collection.

 my $tree = HTML5::DOM->new
    ->parse('<div class="red color">red</div><div class="blue color">blue</div>');
 my $collection = $tree->body->findClass('color');
 print $collection->[0]->html; # <div class="red color">red</div>
 print $collection->[1]->html; # <div class="red color">blue</div>

findAttr

getElementByAttribute

 # Find all elements with attribute
 my $collection = $node->findAttr($attribute);
 my $collection = $node->getElementByAttribute($attribute); # alias
 
 # Find all elements with attribute and mathcing value
 my $collection = $node->findAttr($attribute, $value, $case = 0, $cmp = '=');
 my $collection = $node->getElementByAttribute($attribute, $value, $case = 0, $cmp = '='); # alias

Find all element nodes in tree with specified attribute and optional matching value.

Return HTML5::DOM::Collection.

 my $tree = HTML5::DOM->new
    ->parse('<div class="red color">red</div><div class="blue color">blue</div>');
 my $collection = $tree->body->findAttr('class', 'CoLoR', 1, '~');
 print $collection->[0]->html; # <div class="red color">red</div>
 print $collection->[1]->html; # <div class="red color">blue</div>

CSS selector analogs:

 # [$attribute=$value]
 my $collection = $node->findAttr($attribute, $value, 0, '=');
 
 # [$attribute=$value i]
 my $collection = $node->findAttr($attribute, $value, 1, '=');
 
 # [$attribute~=$value]
 my $collection = $node->findAttr($attribute, $value, 0, '~');
 
 # [$attribute|=$value]
 my $collection = $node->findAttr($attribute, $value, 0, '|');
 
 # [$attribute*=$value]
 my $collection = $node->findAttr($attribute, $value, 0, '*');

 # [$attribute^=$value]
 my $collection = $node->findAttr($attribute, $value, 0, '^');

 # [$attribute$=$value]
 my $collection = $node->findAttr($attribute, $value, 0, '$');

getDefaultBoxType

 my $display = $node->getDefaultBoxType;

Get default CSS "display" property for tag (useful for functions like a innerText).

 my $tree = HTML5::DOM->new
    ->parse('<div class="red color">red</div><script>alert()</script><b>bbb</b>');
 print $tree->at('div')->getDefaultBoxType();       # block
 print $tree->at('script')->getDefaultBoxType();    # none
 print $tree->at('b')->getDefaultBoxType();         # inline

HTML5::DOM::Document

DOM node object for document. Inherit all methods from HTML5::DOM::Element.

HTML5::DOM::Fragment

DOM node object for fragments. Inherit all methods from HTML5::DOM::Element.

HTML5::DOM::Text

DOM node object for text. Inherit all methods from HTML5::DOM::Node.

HTML5::DOM::Comment

DOM node object for comments. Inherit all methods from HTML5::DOM::Node.

HTML5::DOM::DocType

DOM node object for document type. Inherit all methods from HTML5::DOM::Node.

name

 my $name = $node->name;
 my $node = $node->name($new_name);

Return or change root element name from doctype.

 my $tree = HTML5::DOM->new->parse('
        <!DOCTYPE svg>
 ');
 
 # get
 print $tree->document->firstChild->name; # svg
 
 # set
 $tree->document->firstChild->name('html');
 print $tree->document->firstChild->html; # <!DOCTYPE html>

publicId

 my $public_id = $node->publicId;
 my $node = $node->publicId($new_public_id);

Return or change public id from doctype.

 my $tree = HTML5::DOM->new->parse('
        <!DOCTYPE svg:svg PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN" "http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
 ');
 
 # get
 print $tree->document->firstChild->publicId; # -//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN
 
 # set
 print $tree->document->firstChild->publicId('-//W3C//DTD SVG 1.1//EN');
 print $tree->document->firstChild->html; # <!DOCTYPE svg:svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">

systemId

 my $system_id = $node->systemId;
 my $node = $node->systemId($new_system_id);

Return or change public id from doctype.

 my $tree = HTML5::DOM->new->parse('
        <!DOCTYPE svg:svg PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN" "http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
 ');
 
 # get
 print $tree->document->firstChild->systemId; # http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd
 
 # set
 print $tree->document->firstChild->systemId('http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd');
 print $tree->document->firstChild->html; # <!DOCTYPE svg:svg PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">

HTML5::DOM::Collection

CSS Parser object

new

 my $collection = HTML5::DOM::Collection->new($nodes);

Creates new collection from $nodes (reference to array with HTML5::DOM::Node).

each

 $collection->each(sub {...});
 $collection->each(sub {...}, @additional_args);

Foreach all nodes in collection. Returns self.

Example:

 $collection->each(sub {
    my ($node, $index) = @_;
    print "FOUND: node[$index] is a '$node'\n";
 });

 # Also can bypass additional arguments
 $collection->each(sub {
    my ($node, $index, $title) = @_;
    print $title."node[$index] is a '$node'\n";
 }, "FOUND: ");

map

 my $new_collection = $collection->map(sub {
    my ($token, $index) = @_;
    return "FOUND: ".$node->tag." => $index";
 });

 # Also can bypass additional arguments
 my $new_collection = $collection->map(sub {
    my ($token, $index, $title) = @_;
    return $title.$node->tag." => $index";
 }, "FOUND: ");

Apply callback for each node in collection. Returns new array from results.

 my $new_collection = $collection->map($method, @args);

Call method for each node in collection. Returns new HTML5::DOM::Collection from results.

Example:

 # set text 'test!' for all nodes
 $collection->map('text', 'test!');

 # get all tag names as array
 my $new_collection = $collection->map('tag');

 # remove all nodes in collection
 $collection->map('remove');

add

 my $collection = $collection->add($node);

Add new item to collection.

length

 my $length = $collection->length;

Items count in collection.

 my $tree = HTML5::DOM->new->parse('
    <ul>
        <li>Linux</li>
        <!-- comment -->
        <li>OSX</li>
        <li>Windows</li>
    </ul>
 ');
 my $collection = $tree->find('ul li');
 print $collection->length; # 3

grep

 my $new_collection = $collection->grep(qr/regexp/);

Evaluates regexp for html code of each element in collection and creates new collection with all matched elements.

 my $new_collection = $collection->grep(sub {...});
 my $new_collection = $collection->grep(sub {...}, @args);

Evaluates callback foreach element in collection and creates new collection with all elements for which callback returned true.

Example for regexp:

 my $tree = HTML5::DOM->new->parse('
    <ul>
        <li>Linux</li>
        <!-- comment -->
        <li>OSX (not supported)</li>
        <li>Windows (not supported)</li>
    </ul>
 ');
 my $collection = $tree->find('ul li')->grep(qr/not supported/);
 print $collection->length; # 2

Example for callback:

 my $tree = HTML5::DOM->new->parse('
    <ul>
        <li>Linux</li>
        <!-- comment -->
        <li>OSX (not supported)</li>
        <li>Windows (not supported)</li>
    </ul>
 ');
 my $collection = $tree->find('ul li')->grep(sub { $_->html =~ /not supported/ });
 print $collection->length; # 2

first

 my $node = $collection->first;

Get first item in collection.

 my $node = $collection->first(qr/regexp/);

Get first element in collection which html code matches regexp.

 my $node = $collection->first(sub {...});
 my $node = $collection->first(sub {...}, @args);

Get first element in collection which where callback returned true.

Example for regexp:

 my $tree = HTML5::DOM->new->parse('
    <ul>
        <li>Linux</li>
        <!-- comment -->
        <li>OSX (not supported)</li>
        <li>Windows (not supported)</li>
    </ul>
 ');
 my $collection = $tree->find('ul li');
 print $collection->first->html; # <li>Linux</li>
 print $collection->first(qr/not supported/)->html; # <li>OSX (not supported)</li>

Example for callback:

 my $tree = HTML5::DOM->new->parse('
    <ul>
        <li>Linux</li>
        <!-- comment -->
        <li>OSX (not supported)</li>
        <li>Windows (not supported)</li>
    </ul>
 ');
 my $collection = $tree->find('ul li');
 print $collection->first->html; # <li>Linux</li>
 print $collection->first(sub { $_->html =~ /not supported })->html; # <li>OSX (not supported)</li>

last

 my $node = $collection->last;

Get last item in collection.

 my $tree = HTML5::DOM->new->parse('
    <ul>
        <li>Linux</li>
        <!-- comment -->
        <li>OSX</li>
        <li>Windows</li>
    </ul>
 ');
 my $collection = $tree->find('ul li');
 print $collection->last->html; # <li>Windows</li>

item

 my $node = $collection->item($index);
 my $node = $collection->[$index];

Get item by $index in collection.

 my $tree = HTML5::DOM->new->parse('
    <ul>
        <li>Linux</li>
        <!-- comment -->
        <li>OSX</li>
        <li>Windows</li>
    </ul>
 ');
 my $collection = $tree->find('ul li');
 print $collection->item(1)->html;      # <li>OSX</li>
 print $collection->[1]->html;          # <li>OSX</li>

reverse

 my $reversed_collection = $collection->reverse;

Returns copy of collection in reverse order.

 my $tree = HTML5::DOM->new->parse('
    <ul>
        <li>Linux</li>
        <!-- comment -->
        <li>OSX</li>
        <li>Windows</li>
    </ul>
 ');
 my $collection = $tree->find('ul li');
 print join(', ', @{$collection->map('text')};            # Linux, OSX, Windows
 print join(', ', @{$collection->reverse()->map('text')}; # Windows, OSX, Linux

shuffle

 my $shuffled_collection = $collection->shuffle;

Returns copy of collection in random order.

 my $tree = HTML5::DOM->new->parse('
    <ul>
        <li>Linux</li>
        <!-- comment -->
        <li>OSX</li>
        <li>Windows</li>
    </ul>
 ');
 my $collection = $tree->find('ul li');
 print join(', ', @{$collection->shuffle()->map('text')}; # Windows, Linux, OSX
 print join(', ', @{$collection->shuffle()->map('text')}; # Windows, OSX, Linux
 print join(', ', @{$collection->shuffle()->map('text')}; # OSX, Windows, Linux

head

 my $new_collection = $collection->head($length);

Returns copy of collection with only first $length items.

 my $tree = HTML5::DOM->new->parse('
    <ul>
        <li>Linux</li>
        <!-- comment -->
        <li>OSX</li>
        <li>Windows</li>
    </ul>
 ');
 my $collection = $tree->find('ul li');
 print join(', ', @{$collection->head(2)->map('text')}; # Linux, OSX

tail

 my $new_collection = $collection->tail($length);

Returns copy of collection with only last $length items.

 my $tree = HTML5::DOM->new->parse('
    <ul>
        <li>Linux</li>
        <!-- comment -->
        <li>OSX</li>
        <li>Windows</li>
    </ul>
 ');
 my $collection = $tree->find('ul li');
 print join(', ', @{$collection->tail(2)->map('text')}; # OSX, Windows

slice

 my $new_collection = $collection->slice($offset);

Returns new collection with sequence by specified $offset.

If $offset is positive, the sequence will start at that $offset in the $collection. If $offset is negative, the sequence will start that far from the end of the $collection.

 my $new_collection = $collection->slice($offset, $length);

Returns new collection with sequence by specified $offset and $length.

If $offset is positive, the sequence will start at that $offset in the $collection.

If $offset is negative, the sequence will start that far from the end of the $collection.

If $length is positive, then the sequence will have up to that many elements in it.

If the $collection is shorter than the $length, then only the available $collection elements will be present.

If $length is negative then the sequence will stop that many elements from the end of the $collection.

 my $tree = HTML5::DOM->new->parse('
    <ul>
        <li>Linux</li>
        <!-- comment -->
        <li>NetBSD</li>
        <li>OSX</li>
        <li>Windows</li>
    </ul>
 ');
 my $collection = $tree->find('ul li');
 print join(', ', @{$collection->slice(1)->map('text')};      # NetBSD, OSX, Windows
 print join(', ', @{$collection->slice(1, 2)->map('text')};   # NetBSD, OSX
 print join(', ', @{$collection->slice(-2)->map('text')};     # OSX, Windows
 print join(', ', @{$collection->slice(-2, 1)->map('text')};  # OSX
 print join(', ', @{$collection->slice(-3, -1)->map('text')}; # NetBSD, OSX

uniq

 my $new_collection = $collection->uniq();

Returns copy of collection with only uniq nodes.

 my $new_collection = $collection->uniq(sub {...});

Returns copy of collection with only unique nodes which unique identifier of each node returned by callback.

Example:

 my $tree = HTML5::DOM->new->parse('
    <ul>
        <li data-kernel="linux">Ubuntu</li>
        <li data-kernel="linux">Arch Linux</li>
        <!-- comment -->
        <li data-kernel="darwin">OSX</li>
        <li data-kernel="nt">Windows</li>
    </ul>
 ');
 my $collection = $tree->find('ul li');
 print join(', ', @{$collection->uniq->map('text')};                                   # Ubuntu, Arch Linux, OSX, Windows
 print join(', ', @{$collection->uniq(sub { $_->attr("data-kernel") })->map('text')};  # Ubuntu, OSX, Windows

array

 my $node = $collection->array();

Get collection items as array.

html

 my $html = $collection->html;

Concat <outerHTML|/outerHTML> from all items.

text

 my $text = $collection->text;

Concat <textContent|/textContent> from all items.

HTML5::DOM::TokenList

Similar to https://developer.mozilla.org/en-US/docs/Web/API/DOMTokenList

has

contains

 my $flag = $tokens->has($token);
 my $flag = $tokens->contains($token); # alias

Check if token contains in current tokens list.

add

 my $tokens = $tokens->add($token);
 my $tokens = $tokens->add($token, $token2, ...);

Add new token (or tokens) to current tokens list. Returns self.

remove

 my $tokens = $tokens->add($token);
 my $tokens = $tokens->add($token, $token2, ...);

Remove one or more tokens from current tokens list. Returns self.

toggle

 my $state = $tokens->toggle($token);
 my $state = $tokens->toggle($token, $force_state);
  • $token - specified token name

  • $force_state - optional force state.

    If 1 - similar to add

    If 0 - similar to remove

Toggle specified token in current tokens list.

  • If token exists - remove it

  • If token not exists - add it

length

 my $length = $tokens->length;

Returns tokens count in current list.

item

 my $token = $tokens->item($index);
 my $token = $tokens->[$index];

Return token by index.

each

 my $token = $tokens->each(sub {
    my ($token, $index) = @_;
    print "tokens[$index] is a '$token'\n";
 });

Forach all tokens in list.

HTML5::DOM::AsyncResult

Get result and check status from async parsing.

parsed

Non-blocking check status.

 use warnings;
 use strict;
 use HTML5::DOM;

 my $parser = HTML5::DOM->new;
 my $async = $parser->parseAsync('<div>Hello world!</div>' x 1000);
 
 my $is_parsed;
 while (!($is_parsed = $async->parsed)) {
     print "is_parsed=$is_parsed\n";
 }

Returns 1 if async parsing done. Otherwise returns 0.

tree

Non-blocking get result.

 use warnings;
 use strict;
 use HTML5::DOM;

 my $parser = HTML5::DOM->new;
 my $async = $parser->parseAsync('<div>Hello world!</div>' x 1000);
 
 my $tree;
 while (!($tree = $async->tree)) {
     print "is_parsed=".($tree ? 1 : 0)."\n";
 }
 
 print $tree->at('div')->text."\n"; # Hello world!

Returns HTML5::DOM::Tree object if async parsing done. Otherwise returns undef.

wait

 use warnings;
 use strict;
 use HTML5::DOM;

 my $parser = HTML5::DOM->new;
 my $async = $parser->parseAsync('<div>Hello world!</div>' x 1000);
 
 my $tree = $async->wait;
 
 print $tree->at('div')->text."\n"; # Hello world!

Blocking waits for parsing done and returns HTML5::DOM::Tree object.

HTML5::DOM::CSS

CSS Parser object

new

 # with default options
 my $css = HTML5::DOM::CSS->new;

 # or override some options, if you need
 my $css = HTML5::DOM::CSS->new({
     utf8 => 0
 });

Create new css parser object wuth options. See "CSS PARSER OPTIONS" for details.

parseSelector

 my $selector = HTML5::DOM::CSS->parseSelector($selector_text);

Parse $selector_text and return HTML5::DOM::CSS::Selector.

 my $css = HTML5::DOM::CSS->new;
 my $selector = $css->parseSelector('body div.red, body span.blue');
 
 # with custom options (extends options defined in HTML5::DOM::CSS->new)
 my $selector = $css->parseSelector('body div.red, body span.blue', { utf8 => 0 });

HTML5::DOM::CSS::Selector

CSS Selector object (precompiled selector)

new

 my $selector = HTML5::DOM::CSS::Selector->new($selector_text);

Parse $selector_text and create new css selector object. If your need parse many selectors, more efficient way using single instance of parser HTML5::DOM::CSS and parseSelector method.

text

 my $selector_text = $selector->text;

Serialize selector to text.

 my $css = HTML5::DOM::CSS->new;
 my $selector = $css->parseSelector('body div.red, body span.blue');
 print $selector->text."\n"; # body div.red, body span.blue

ast

 my $ast = $entry->ast;

Serialize selector to very simple AST format.

 my $css = HTML5::DOM::CSS->new;
 my $selector = $css->parseSelector('div > .red');
 print Dumper($selector->ast);
 
 # $VAR1 = [[
 #     {
 #         'value' => 'div',
 #         'type' => 'tag'
 #     },
 #     {
 #         'type'  => 'combinator',
 #         'value' => 'child'
 #     },
 #     {
 #         'type' => 'class',
 #         'value' => 'red'
 #     }
 # ]];

length

 my $length = $selector->length;

Get selector entries count (selectors separated by "," combinator)

 my $css = HTML5::DOM::CSS->new;
 my $selector = $css->parseSelector('body div.red, body span.blue');
 print $selector->length."\n"; # 2

entry

 my $entry = $selector->entry($index);

Get selector entry by $index end return HTML5::DOM::CSS::Selector::Entry.

 my $css = HTML5::DOM::CSS->new;
 my $selector = $css->parseSelector('body div.red, body span.blue');
 print $selector->entry(0)->text."\n"; # body div.red
 print $selector->entry(1)->text."\n"; # body span.blue

utf8

As getter - get 1 if current selector object returns all strings with utf8 flag.

Example with utf8:

 use warnings;
 use strict;
 use HTML5::DOM;
 use utf8;
 
 my $selector = HTML5::DOM::CSS->new->parseSelector("[name=\"тест\"]");
 my $is_utf8_enabled = $selector->utf8;
 print "is_utf8_enabled=".($is_utf8_enabled ? "true" : "false")."\n"; # true

Or example with bytes:

 use warnings;
 use strict;
 use HTML5::DOM;
 
 my $selector = HTML5::DOM::CSS->new->parseSelector("[name=\"тест\"]");
 my $is_utf8_enabled = $selector->utf8;
 print "is_utf8_enabled=".($is_utf8_enabled ? "true" : "false")."\n"; # false

As setter - enable or disable utf8 flag on all returned strings.

 use warnings;
 use strict;
 use HTML5::DOM;
 use utf8;
 
 my $selector = HTML5::DOM::CSS->new->parseSelector("[name=\"тест\"]");
 
 print "is_utf8_enabled=".($selector->utf8 ? "true" : "false")."\n"; # true
 print length($selector->text)." chars\n"; # 13 chars
 
 $selector->utf8(0);
 
 print "is_utf8_enabled=".($selector->utf8 ? "true" : "false")."\n"; # false
 print length($selector->text)." bytes\n"; # 17 bytes

HTML5::DOM::CSS::Selector::Entry

CSS selector entry object (precompiled selector)

text

 my $selector_text = $entry->text;

Serialize entry to text.

 my $css = HTML5::DOM::CSS->new;
 my $selector = $css->parseSelector('body div.red, body span.blue');
 my $entry = $selector->entry(0);
 print $entry->text."\n"; # body div.red

pseudoElement

 my $pseudo_name = $entry->pseudoElement;

Return pseudo-element name for entry.

 my $css = HTML5::DOM::CSS->new;
 my $selector = $css->parseSelector('div::after');
 my $entry = $selector->entry(0);
 print $entry->pseudoElement."\n"; # after

ast

 my $ast = $entry->ast;

Serialize entry to very simple AST format.

 my $css = HTML5::DOM::CSS->new;
 my $selector = $css->parseSelector('div > .red');
 my $entry = $selector->entry(0);
 print Dumper($entry->ast);
 
 # $VAR1 = [
 #     {
 #         'value' => 'div',
 #         'type' => 'tag'
 #     },
 #     {
 #         'type'  => 'combinator',
 #         'value' => 'child'
 #     },
 #     {
 #         'type' => 'class',
 #         'value' => 'red'
 #     }
 # ];

specificity

 my $specificity = $entry->specificity;

Get specificity in hash {a, b, c}

 my $css = HTML5::DOM::CSS->new;
 my $selector = $css->parseSelector('body div.red, body span.blue');
 my $entry = $selector->entry(0);
 print Dumper($entry->specificity); # {a => 0, b => 1, c => 2}

specificityArray

 my $specificity = $entry->specificityArray;

Get specificity in array [a, b, c] (ordered by weight)

 my $css = HTML5::DOM::CSS->new;
 my $selector = $css->parseSelector('body div.red, body span.blue');
 my $entry = $selector->entry(0);
 print Dumper($entry->specificityArray); # [0, 1, 2]

HTML5::DOM::Encoding

Encoding detection.

See for available encodings: "ENCODINGS"

id2name

 my $encoding = HTML5::DOM::Encoding::id2name($encoding_id);

Get encoding name by id.

 print HTML5::DOM::Encoding::id2name(HTML5::DOM::Encoding->UTF_8); # UTF-8

name2id

 my $encoding_id = HTML5::DOM::Encoding::name2id($encoding);

Get id by name.

 print HTML5::DOM::Encoding->UTF_8;             # 0
 print HTML5::DOM::Encoding::id2name("UTF-8");  # 0

detectAuto

 my ($encoding_id, $new_text) = HTML5::DOM::Encoding::detectAuto($text, $max_length = 0);

Auto detect text encoding using (in this order):

Returns array with encoding id and new text without BOM, if success.

If fail, then encoding id equal HTML5::DOM::Encoding->NOT_DETERMINED.

 my ($encoding_id, $new_text) = HTML5::DOM::Encoding::detectAuto("ололо");
 my $encoding = HTML5::DOM::Encoding::id2name($encoding_id);
 print $encoding; # UTF-8

detect

 my $encoding_id = HTML5::DOM::Encoding::detect($text, $max_length = 0);

Detect text encoding. Single method for both detectCyrillic and detectUnicode.

Returns encoding id, if success. And returns HTML5::DOM::Encoding->NOT_DETERMINED if fail.

 my $encoding_id = HTML5::DOM::Encoding::detect("ололо");
 my $encoding = HTML5::DOM::Encoding::id2name($encoding_id);
 print $encoding; # UTF-8

detectCyrillic

 my $encoding_id = HTML5::DOM::Encoding::detectCyrillic($text, $max_length = 0);

Detect cyrillic text encoding (using lowercase trigrams), such as windows-1251, koi8-r, iso-8859-5, x-mac-cyrillic, ibm866.

Returns encoding id, if success. And returns HTML5::DOM::Encoding->NOT_DETERMINED if fail.

This method also have aliases for compatibility reasons: detectUkrainian, detectRussian

detectUnicode

 my $encoding_id = HTML5::DOM::Encoding::detectUnicode($text, $max_length = 0);

Detect unicode family text encoding, such as UTF-8, UTF-16LE, UTF-16BE.

Returns encoding id, if success. And returns HTML5::DOM::Encoding->NOT_DETERMINED if fail.

 # get UTF-16LE data for test
 my $str = "ололо";
 Encode::from_to($str, "UTF-8", "UTF-16LE");
 
 my $encoding_id = HTML5::DOM::Encoding::detectUnicode($str);
 my $encoding = HTML5::DOM::Encoding::id2name($encoding_id);
 print $encoding; # UTF-16LE

detectByPrescanStream

 my $encoding_id = HTML5::DOM::Encoding::detectByPrescanStream($text, $max_length = 0);

Detect encoding by parsing <meta> tags in html.

Returns encoding id, if success. And returns HTML5::DOM::Encoding->NOT_DETERMINED if fail.

See for more info: https://html.spec.whatwg.org/multipage/syntax.html#prescan-a-byte-stream-to-determine-its-encoding

 my $encoding_id = HTML5::DOM::Encoding::detectByPrescanStream('
    <meta http-equiv="content-type" content="text/html; charset=windows-1251">
 ');
 my $encoding = HTML5::DOM::Encoding::id2name($encoding_id);
 print $encoding; # WINDOWS-1251

detectByCharset

 my $encoding_id = HTML5::DOM::Encoding::detectByCharset($text, $max_length = 0);

Extracting character encoding from string. Find "charset=" and see encoding. Return found raw data.

For example: "text/html; charset=windows-1251". Return HTML5::DOM::Encoding->WINDOWS_1251

And returns HTML5::DOM::Encoding->NOT_DETERMINED if fail.

See for more info: https://html.spec.whatwg.org/multipage/infrastructure.html#algorithm-for-extracting-a-character-encoding-from-a-meta-element

 my $encoding_id = HTML5::DOM::Encoding::detectByPrescanStream('
    <meta http-equiv="content-type" content="text/html; charset=windows-1251">
 ');
 my $encoding = HTML5::DOM::Encoding::id2name($encoding_id);
 print $encoding; # WINDOWS-1251

detectBomAndCut

 my ($encoding_id, $new_text) = HTML5::DOM::Encoding::detectBomAndCut($text, $max_length = 0);

Returns array with encoding id and new text without BOM.

If fail, then encoding id equal HTML5::DOM::Encoding->NOT_DETERMINED.

 my ($encoding_id, $new_text) = HTML5::DOM::Encoding::detectBomAndCut("\xEF\xBB\xBFололо");
 my $encoding = HTML5::DOM::Encoding::id2name($encoding_id);
 print $encoding; # UTF-8
 print $new_text; # ололо

NAMESPACES

Supported namespace names

 html, matml, svg, xlink, xml, xmlns

Supported namespace id constants

 HTML5::DOM->NS_UNDEF
 HTML5::DOM->NS_HTML
 HTML5::DOM->NS_MATHML
 HTML5::DOM->NS_SVG
 HTML5::DOM->NS_XLINK
 HTML5::DOM->NS_XML
 HTML5::DOM->NS_XMLNS
 HTML5::DOM->NS_ANY
 HTML5::DOM->NS_LAST_ENTRY

TAGS

 HTML5::DOM->TAG__UNDEF
 HTML5::DOM->TAG__TEXT
 HTML5::DOM->TAG__COMMENT
 HTML5::DOM->TAG__DOCTYPE
 HTML5::DOM->TAG_A
 HTML5::DOM->TAG_ABBR
 HTML5::DOM->TAG_ACRONYM
 HTML5::DOM->TAG_ADDRESS
 HTML5::DOM->TAG_ANNOTATION_XML
 HTML5::DOM->TAG_APPLET
 HTML5::DOM->TAG_AREA
 HTML5::DOM->TAG_ARTICLE
 HTML5::DOM->TAG_ASIDE
 HTML5::DOM->TAG_AUDIO
 HTML5::DOM->TAG_B
 HTML5::DOM->TAG_BASE
 HTML5::DOM->TAG_BASEFONT
 HTML5::DOM->TAG_BDI
 HTML5::DOM->TAG_BDO
 HTML5::DOM->TAG_BGSOUND
 HTML5::DOM->TAG_BIG
 HTML5::DOM->TAG_BLINK
 HTML5::DOM->TAG_BLOCKQUOTE
 HTML5::DOM->TAG_BODY
 HTML5::DOM->TAG_BR
 HTML5::DOM->TAG_BUTTON
 HTML5::DOM->TAG_CANVAS
 HTML5::DOM->TAG_CAPTION
 HTML5::DOM->TAG_CENTER
 HTML5::DOM->TAG_CITE
 HTML5::DOM->TAG_CODE
 HTML5::DOM->TAG_COL
 HTML5::DOM->TAG_COLGROUP
 HTML5::DOM->TAG_COMMAND
 HTML5::DOM->TAG_COMMENT
 HTML5::DOM->TAG_DATALIST
 HTML5::DOM->TAG_DD
 HTML5::DOM->TAG_DEL
 HTML5::DOM->TAG_DETAILS
 HTML5::DOM->TAG_DFN
 HTML5::DOM->TAG_DIALOG
 HTML5::DOM->TAG_DIR
 HTML5::DOM->TAG_DIV
 HTML5::DOM->TAG_DL
 HTML5::DOM->TAG_DT
 HTML5::DOM->TAG_EM
 HTML5::DOM->TAG_EMBED
 HTML5::DOM->TAG_FIELDSET
 HTML5::DOM->TAG_FIGCAPTION
 HTML5::DOM->TAG_FIGURE
 HTML5::DOM->TAG_FONT
 HTML5::DOM->TAG_FOOTER
 HTML5::DOM->TAG_FORM
 HTML5::DOM->TAG_FRAME
 HTML5::DOM->TAG_FRAMESET
 HTML5::DOM->TAG_H1
 HTML5::DOM->TAG_H2
 HTML5::DOM->TAG_H3
 HTML5::DOM->TAG_H4
 HTML5::DOM->TAG_H5
 HTML5::DOM->TAG_H6
 HTML5::DOM->TAG_HEAD
 HTML5::DOM->TAG_HEADER
 HTML5::DOM->TAG_HGROUP
 HTML5::DOM->TAG_HR
 HTML5::DOM->TAG_HTML
 HTML5::DOM->TAG_I
 HTML5::DOM->TAG_IFRAME
 HTML5::DOM->TAG_IMAGE
 HTML5::DOM->TAG_IMG
 HTML5::DOM->TAG_INPUT
 HTML5::DOM->TAG_INS
 HTML5::DOM->TAG_ISINDEX
 HTML5::DOM->TAG_KBD
 HTML5::DOM->TAG_KEYGEN
 HTML5::DOM->TAG_LABEL
 HTML5::DOM->TAG_LEGEND
 HTML5::DOM->TAG_LI
 HTML5::DOM->TAG_LINK
 HTML5::DOM->TAG_LISTING
 HTML5::DOM->TAG_MAIN
 HTML5::DOM->TAG_MAP
 HTML5::DOM->TAG_MARK
 HTML5::DOM->TAG_MARQUEE
 HTML5::DOM->TAG_MENU
 HTML5::DOM->TAG_MENUITEM
 HTML5::DOM->TAG_META
 HTML5::DOM->TAG_METER
 HTML5::DOM->TAG_MTEXT
 HTML5::DOM->TAG_NAV
 HTML5::DOM->TAG_NOBR
 HTML5::DOM->TAG_NOEMBED
 HTML5::DOM->TAG_NOFRAMES
 HTML5::DOM->TAG_NOSCRIPT
 HTML5::DOM->TAG_OBJECT
 HTML5::DOM->TAG_OL
 HTML5::DOM->TAG_OPTGROUP
 HTML5::DOM->TAG_OPTION
 HTML5::DOM->TAG_OUTPUT
 HTML5::DOM->TAG_P
 HTML5::DOM->TAG_PARAM
 HTML5::DOM->TAG_PLAINTEXT
 HTML5::DOM->TAG_PRE
 HTML5::DOM->TAG_PROGRESS
 HTML5::DOM->TAG_Q
 HTML5::DOM->TAG_RB
 HTML5::DOM->TAG_RP
 HTML5::DOM->TAG_RT
 HTML5::DOM->TAG_RTC
 HTML5::DOM->TAG_RUBY
 HTML5::DOM->TAG_S
 HTML5::DOM->TAG_SAMP
 HTML5::DOM->TAG_SCRIPT
 HTML5::DOM->TAG_SECTION
 HTML5::DOM->TAG_SELECT
 HTML5::DOM->TAG_SMALL
 HTML5::DOM->TAG_SOURCE
 HTML5::DOM->TAG_SPAN
 HTML5::DOM->TAG_STRIKE
 HTML5::DOM->TAG_STRONG
 HTML5::DOM->TAG_STYLE
 HTML5::DOM->TAG_SUB
 HTML5::DOM->TAG_SUMMARY
 HTML5::DOM->TAG_SUP
 HTML5::DOM->TAG_SVG
 HTML5::DOM->TAG_TABLE
 HTML5::DOM->TAG_TBODY
 HTML5::DOM->TAG_TD
 HTML5::DOM->TAG_TEMPLATE
 HTML5::DOM->TAG_TEXTAREA
 HTML5::DOM->TAG_TFOOT
 HTML5::DOM->TAG_TH
 HTML5::DOM->TAG_THEAD
 HTML5::DOM->TAG_TIME
 HTML5::DOM->TAG_TITLE
 HTML5::DOM->TAG_TR
 HTML5::DOM->TAG_TRACK
 HTML5::DOM->TAG_TT
 HTML5::DOM->TAG_U
 HTML5::DOM->TAG_UL
 HTML5::DOM->TAG_VAR
 HTML5::DOM->TAG_VIDEO
 HTML5::DOM->TAG_WBR
 HTML5::DOM->TAG_XMP
 HTML5::DOM->TAG_ALTGLYPH
 HTML5::DOM->TAG_ALTGLYPHDEF
 HTML5::DOM->TAG_ALTGLYPHITEM
 HTML5::DOM->TAG_ANIMATE
 HTML5::DOM->TAG_ANIMATECOLOR
 HTML5::DOM->TAG_ANIMATEMOTION
 HTML5::DOM->TAG_ANIMATETRANSFORM
 HTML5::DOM->TAG_CIRCLE
 HTML5::DOM->TAG_CLIPPATH
 HTML5::DOM->TAG_COLOR_PROFILE
 HTML5::DOM->TAG_CURSOR
 HTML5::DOM->TAG_DEFS
 HTML5::DOM->TAG_DESC
 HTML5::DOM->TAG_ELLIPSE
 HTML5::DOM->TAG_FEBLEND
 HTML5::DOM->TAG_FECOLORMATRIX
 HTML5::DOM->TAG_FECOMPONENTTRANSFER
 HTML5::DOM->TAG_FECOMPOSITE
 HTML5::DOM->TAG_FECONVOLVEMATRIX
 HTML5::DOM->TAG_FEDIFFUSELIGHTING
 HTML5::DOM->TAG_FEDISPLACEMENTMAP
 HTML5::DOM->TAG_FEDISTANTLIGHT
 HTML5::DOM->TAG_FEDROPSHADOW
 HTML5::DOM->TAG_FEFLOOD
 HTML5::DOM->TAG_FEFUNCA
 HTML5::DOM->TAG_FEFUNCB
 HTML5::DOM->TAG_FEFUNCG
 HTML5::DOM->TAG_FEFUNCR
 HTML5::DOM->TAG_FEGAUSSIANBLUR
 HTML5::DOM->TAG_FEIMAGE
 HTML5::DOM->TAG_FEMERGE
 HTML5::DOM->TAG_FEMERGENODE
 HTML5::DOM->TAG_FEMORPHOLOGY
 HTML5::DOM->TAG_FEOFFSET
 HTML5::DOM->TAG_FEPOINTLIGHT
 HTML5::DOM->TAG_FESPECULARLIGHTING
 HTML5::DOM->TAG_FESPOTLIGHT
 HTML5::DOM->TAG_FETILE
 HTML5::DOM->TAG_FETURBULENCE
 HTML5::DOM->TAG_FILTER
 HTML5::DOM->TAG_FONT_FACE
 HTML5::DOM->TAG_FONT_FACE_FORMAT
 HTML5::DOM->TAG_FONT_FACE_NAME
 HTML5::DOM->TAG_FONT_FACE_SRC
 HTML5::DOM->TAG_FONT_FACE_URI
 HTML5::DOM->TAG_FOREIGNOBJECT
 HTML5::DOM->TAG_G
 HTML5::DOM->TAG_GLYPH
 HTML5::DOM->TAG_GLYPHREF
 HTML5::DOM->TAG_HKERN
 HTML5::DOM->TAG_LINE
 HTML5::DOM->TAG_LINEARGRADIENT
 HTML5::DOM->TAG_MARKER
 HTML5::DOM->TAG_MASK
 HTML5::DOM->TAG_METADATA
 HTML5::DOM->TAG_MISSING_GLYPH
 HTML5::DOM->TAG_MPATH
 HTML5::DOM->TAG_PATH
 HTML5::DOM->TAG_PATTERN
 HTML5::DOM->TAG_POLYGON
 HTML5::DOM->TAG_POLYLINE
 HTML5::DOM->TAG_RADIALGRADIENT
 HTML5::DOM->TAG_RECT
 HTML5::DOM->TAG_SET
 HTML5::DOM->TAG_STOP
 HTML5::DOM->TAG_SWITCH
 HTML5::DOM->TAG_SYMBOL
 HTML5::DOM->TAG_TEXT
 HTML5::DOM->TAG_TEXTPATH
 HTML5::DOM->TAG_TREF
 HTML5::DOM->TAG_TSPAN
 HTML5::DOM->TAG_USE
 HTML5::DOM->TAG_VIEW
 HTML5::DOM->TAG_VKERN
 HTML5::DOM->TAG_MATH
 HTML5::DOM->TAG_MACTION
 HTML5::DOM->TAG_MALIGNGROUP
 HTML5::DOM->TAG_MALIGNMARK
 HTML5::DOM->TAG_MENCLOSE
 HTML5::DOM->TAG_MERROR
 HTML5::DOM->TAG_MFENCED
 HTML5::DOM->TAG_MFRAC
 HTML5::DOM->TAG_MGLYPH
 HTML5::DOM->TAG_MI
 HTML5::DOM->TAG_MLABELEDTR
 HTML5::DOM->TAG_MLONGDIV
 HTML5::DOM->TAG_MMULTISCRIPTS
 HTML5::DOM->TAG_MN
 HTML5::DOM->TAG_MO
 HTML5::DOM->TAG_MOVER
 HTML5::DOM->TAG_MPADDED
 HTML5::DOM->TAG_MPHANTOM
 HTML5::DOM->TAG_MROOT
 HTML5::DOM->TAG_MROW
 HTML5::DOM->TAG_MS
 HTML5::DOM->TAG_MSCARRIES
 HTML5::DOM->TAG_MSCARRY
 HTML5::DOM->TAG_MSGROUP
 HTML5::DOM->TAG_MSLINE
 HTML5::DOM->TAG_MSPACE
 HTML5::DOM->TAG_MSQRT
 HTML5::DOM->TAG_MSROW
 HTML5::DOM->TAG_MSTACK
 HTML5::DOM->TAG_MSTYLE
 HTML5::DOM->TAG_MSUB
 HTML5::DOM->TAG_MSUP
 HTML5::DOM->TAG_MSUBSUP
 HTML5::DOM->TAG__END_OF_FILE
 HTML5::DOM->TAG_LAST_ENTRY

ENCODINGS

Supported encoding names

 AUTO, NOT-DETERMINED, X-USER-DEFINED, 
 BIG5, EUC-JP, EUC-KR, GB18030, GBK, IBM866, MACINTOSH, X-MAC-CYRILLIC, SHIFT_JIS, 
 ISO-2022-JP, ISO-8859-10, ISO-8859-13, ISO-8859-14, ISO-8859-15, ISO-8859-16, ISO-8859-2, 
 ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-8-I, 
 WINDOWS-1250, WINDOWS-1251, WINDOWS-1252, WINDOWS-1253, WINDOWS-1254, 
 WINDOWS-1255, WINDOWS-1256, WINDOWS-1257, WINDOWS-1258, WINDOWS-874, 
 UTF-8, UTF-16BE, UTF-16LE, KOI8-R, KOI8-U

Supported encoding id consts

 HTML5::DOM::Encoding->DEFAULT
 HTML5::DOM::Encoding->AUTO
 HTML5::DOM::Encoding->NOT_DETERMINED
 HTML5::DOM::Encoding->UTF_8
 HTML5::DOM::Encoding->UTF_16LE
 HTML5::DOM::Encoding->UTF_16BE
 HTML5::DOM::Encoding->X_USER_DEFINED
 HTML5::DOM::Encoding->BIG5
 HTML5::DOM::Encoding->EUC_JP
 HTML5::DOM::Encoding->EUC_KR
 HTML5::DOM::Encoding->GB18030
 HTML5::DOM::Encoding->GBK
 HTML5::DOM::Encoding->IBM866
 HTML5::DOM::Encoding->ISO_2022_JP
 HTML5::DOM::Encoding->ISO_8859_10
 HTML5::DOM::Encoding->ISO_8859_13
 HTML5::DOM::Encoding->ISO_8859_14
 HTML5::DOM::Encoding->ISO_8859_15
 HTML5::DOM::Encoding->ISO_8859_16
 HTML5::DOM::Encoding->ISO_8859_2
 HTML5::DOM::Encoding->ISO_8859_3
 HTML5::DOM::Encoding->ISO_8859_4
 HTML5::DOM::Encoding->ISO_8859_5
 HTML5::DOM::Encoding->ISO_8859_6
 HTML5::DOM::Encoding->ISO_8859_7
 HTML5::DOM::Encoding->ISO_8859_8
 HTML5::DOM::Encoding->ISO_8859_8_I
 HTML5::DOM::Encoding->KOI8_R
 HTML5::DOM::Encoding->KOI8_U
 HTML5::DOM::Encoding->MACINTOSH
 HTML5::DOM::Encoding->SHIFT_JIS
 HTML5::DOM::Encoding->WINDOWS_1250
 HTML5::DOM::Encoding->WINDOWS_1251
 HTML5::DOM::Encoding->WINDOWS_1252
 HTML5::DOM::Encoding->WINDOWS_1253
 HTML5::DOM::Encoding->WINDOWS_1254
 HTML5::DOM::Encoding->WINDOWS_1255
 HTML5::DOM::Encoding->WINDOWS_1256
 HTML5::DOM::Encoding->WINDOWS_1257
 HTML5::DOM::Encoding->WINDOWS_1258
 HTML5::DOM::Encoding->WINDOWS_874
 HTML5::DOM::Encoding->X_MAC_CYRILLIC
 HTML5::DOM::Encoding->LAST_ENTRY

PARSER OPTIONS

Options for:

threads

Threads count, if < 2 - parsing in single mode without threads (default 0)

This option affects only for HTML5::DOM::new.

Originaly, MyHTML can use mulithread parsing.

But in real cases this mode slower than single mode (threads=0). Result speed very OS-specific and depends on input html.

Not recommended use if don't known what you do. Single mode faster in 99.9% cases.

ignore_whitespace

Ignore whitespace tokens (default 0)

ignore_doctype

Do not parse DOCTYPE (default 0)

scripts

If 1 - <noscript> contents parsed to single text node (default)

If 0 - <noscript> contents parsed to child nodes

encoding

Encoding of input HTML, if auto - library can tree to automaticaly determine encoding. (default "auto")

Allowed both encoding name or id.

default_encoding

Default encoding, this affects only if encoding set to auto and encoding not determined. (default "UTF-8")

Allowed both encoding name or id.

See for available encodings: "ENCODINGS"

encoding_use_meta

Allow use <meta> tags to determine input HTML encoding. (default 1)

See detectByPrescanStream.

encoding_prescan_limit

Limit string length to determine encoding by <meta> tags. (default 1024, from spec)

See detectByPrescanStream.

encoding_use_bom

Allow use detecding BOM to determine input HTML encoding. (default 1)

See detectBomAndCut.

utf8

Default: "auto"

If 1, then all returned strings have utf8 flag (chars).

If 0, then all returned strings haven't utf8 flag (bytes).

If "auto", then utf8 flag detected by input string. Automaticaly enables utf8=1 if input string have utf8 flag.

"auto" works only in parse, parseChunk, parseAsync methods.

CSS PARSER OPTIONS

Options for:

utf8

Default: "auto"

If 1, then all returned strings have utf8 flag (chars).

If 0, then all returned strings haven't utf8 flag (bytes).

If "auto", then utf8 flag detected by input string. Automaticaly enables utf8=1 if input string have utf8 flag.

HTML5 SUPPORT

Tested with html5lib-tests (at 2021-06-26)

 -------------------------------------------------------------
 test                        total    ok      fail    skip
 -------------------------------------------------------------
 foreign-fragment.dat        66       54      12      0
 tests26.dat                 19       16      3       0
 menuitem-element.dat        19       16      3       0
 tests11.dat                 12       11      1       0
 tests1.dat                  112      112     0       0
 tests4.dat                  6        6       0       0
 tests6.dat                  51       51      0       0
 ruby.dat                    20       20      0       0
 adoption01.dat              17       17      0       0
 tests14.dat                 6        6       0       0
 tests19.dat                 104      104     0       0
 tests7.dat                  30       30      0       0
 noscript01.dat              17       17      0       0
 tests17.dat                 12       12      0       0
 tests23.dat                 4        4       0       0
 pending-spec-changes.dat    2        2       0       0
 tables01.dat                16       16      0       0
 entities02.dat              25       25      0       0
 tests22.dat                 4        4       0       0
 tests10.dat                 53       53      0       0
 tests15.dat                 13       13      0       0
 inbody01.dat                3        3       0       0
 template.dat                107      107     0       0
 plain-text-unsafe.dat       32       32      0       0
 comments01.dat              15       15      0       0
 scriptdata01.dat            26       26      0       0
 svg.dat                     7        7       0       0
 tests25.dat                 25       25      0       0
 tests3.dat                  23       23      0       0
 tests20.dat                 43       43      0       0
 tests12.dat                 1        1       0       0
 tests21.dat                 24       24      0       0
 math.dat                    7        7       0       0
 webkit01.dat                49       49      0       0
 main-element.dat            2        2       0       0
 adoption02.dat              1        1       0       0
 domjs-unsafe.dat            48       48      0       0
 tests16.dat                 196      196     0       0
 blocks.dat                  47       47      0       0
 tests5.dat                  16       16      0       0
 tests8.dat                  9        9       0       0
 tricky01.dat                8        8       0       0
 tests18.dat                 35       35      0       0
 webkit02.dat                20       20      0       0
 tests24.dat                 7        7       0       0
 html5test-com.dat           23       23      0       0
 isindex.dat                 3        3       0       0
 doctype01.dat               36       36      0       0
 entities01.dat              74       74      0       0
 tests2.dat                  61       61      0       0
 tests9.dat                  26       26      0       0
 tests_innerHTML_1.dat       84       84      0       0
 summary                     1666     1647    19      0

Tested with examples/html5lib_tests.pl

 perl examples/html5lib_tests.pl --dir=../html5lib-tests/tree-construction --colordiff

Send patches to lexborisov's MyHTML if you want improve this result.

WORK WITH UTF8

In normal cases you must don't care about utf8. Everything works out of the box.

By default utf8 mode enabled automaticaly if you specify string with utf8 flag.

For example:

Perfect work with use utf8:

 use warnings;
 use strict;
 use HTML5::DOM;
 use utf8;
 
 my $parser = HTML5::DOM->new;
 my $str = HTML5::DOM->new->parse('<b>тест тест</b>')->at('b')->text;
 print "length=".length($str)." [$str]\n"; # length=9 [тест тест]

Perfect work without use utf8:

 use warnings;
 use strict;
 use HTML5::DOM;
 
 # Perfect work with default mode of perl strings (bytes)
 my $parser = HTML5::DOM->new;
 my $str = HTML5::DOM->new->parse('<b>тест тест</b>')->at('b')->text;
 print "length=".length($str)." [$str]\n"; # length=17 [тест тест]

 # You can pass string with utf8 flag without "use utf8" and it perfect works
 use Encode;
 my $test = '<b>тест тест</b>';
 Encode::_utf8_on($test);
 
 $str = HTML5::DOM->new->parse($test)->at('b')->text;
 print "length=".length($str)." [$str]\n"; # length=9 [тест тест]

But you can override this behavior - see "PARSER OPTIONS" for details.

Force use bytes:

 use warnings;
 use strict;
 use HTML5::DOM;
 use utf8;
 
 my $parser = HTML5::DOM->new({ utf8 => 0 });
 my $str = $parser->parse('<b>тест тест</b>')->at('b')->text;
 print "length=".length($str)." [$str]\n"; # length=17 [тест тест]

Force use utf8:

 use warnings;
 use strict;
 use HTML5::DOM;
 
 my $parser = HTML5::DOM->new({ utf8 => 1 });
 my $str = $parser->parse('<b>тест тест</b>')->at('b')->text;
 print "length=".length($str)." [$str]\n"; # length=13 [тест тест]

BUGS

https://github.com/Azq2/perl-html5-dom/issues

SEE ALSO

  • HTML::MyHTML - more low-level myhtml bindings.

  • Mojo::DOM - pure perl HTML5 DOM library with CSS selectors.

AUTHOR

Kirill Zhumarin <kirill.zhumarin@gmail.com>

LICENSE