Mojo::DOM - Minimalistic HTML5/XML DOM parser with CSS3 selectors
use Mojo::DOM; # Parse my $dom = Mojo::DOM->new('<div><p id="a">A</p><p id="b">B</p></div>'); # Find my $b = $dom->at('#b'); say $b->text; # Walk say $dom->div->p->[0]->text; say $dom->div->children('p')->first->{id}; # Iterate $dom->find('p[id]')->each(sub { say shift->{id} }); # Loop for my $e ($dom->find('p[id]')->each) { say $e->text; } # Modify $dom->div->p->[1]->append('<p id="c">C</p>'); # Render say $dom;
Mojo::DOM is a minimalistic and relaxed HTML5/XML DOM parser with CSS3 selector support. It will even try to interpret broken XML, so you should not use it for validation.
Mojo::DOM defaults to HTML5 semantics, that means all tags and attributes are lowercased and selectors need to be lowercase as well.
my $dom = Mojo::DOM->new('<P ID="greeting">Hi!</P>'); say $dom->at('p')->text; say $dom->p->{id};
If XML processing instructions are found, the parser will automatically switch into XML mode and everything becomes case sensitive.
my $dom = Mojo::DOM->new('<?xml version="1.0"?><P ID="greeting">Hi!</P>'); say $dom->at('P')->text; say $dom->P->{ID};
XML detection can also be disabled with the xml method.
xml
# Force XML semantics $dom->xml(1); # Force HTML5 semantics $dom->xml(0);
Mojo::DOM implements the following methods.
new
my $dom = Mojo::DOM->new; my $dom = Mojo::DOM->new('<foo bar="baz">test</foo>');
Construct a new Mojo::DOM object.
all_text
my $trimmed = $dom->all_text; my $untrimmed = $dom->all_text(0);
Extract all text content from DOM structure, smart whitespace trimming is enabled by default.
# "foo bar baz" $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->div->all_text; # "foo\nbarbaz\n" $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->div->all_text(0);
append
$dom = $dom->append('<p>Hi!</p>');
Append to element.
# "<div><h1>A</h1><h2>B</h2></div>" $dom->parse('<div><h1>A</h1></div>')->at('h1')->append('<h2>B</h2>')->root;
append_content
$dom = $dom->append_content('<p>Hi!</p>');
Append to element content.
# "<div><h1>AB</h1></div>" $dom->parse('<div><h1>A</h1></div>')->at('h1')->append_content('B')->root;
at
my $result = $dom->at('html title');
Find a single element with CSS3 selectors. All selectors from Mojo::DOM::CSS are supported.
# Find first element with "svg" namespace definition my $namespace = $dom->at('[xmlns\:svg]')->{'xmlns:svg'};
attrs
my $attrs = $dom->attrs; my $foo = $dom->attrs('foo'); $dom = $dom->attrs({foo => 'bar'}); $dom = $dom->attrs(foo => 'bar');
Element attributes.
charset
my $charset = $dom->charset; $dom = $dom->charset('UTF-8');
Alias for "charset" in Mojo::DOM::HTML.
children
my $collection = $dom->children; my $collection = $dom->children('div');
Return a Mojo::Collection object containing the children of this element, similar to find.
find
# Show type of random child element say $dom->children->shuffle->first->type;
content_xml
my $xml = $dom->content_xml;
Render content of this element to XML. Note that the XML will be encoded if a charset has been defined.
# "<b>test</b>" $dom->parse('<div><b>test</b></div>')->div->content_xml;
my $collection = $dom->find('html title');
Find elements with CSS3 selectors and return a Mojo::Collection object. All selectors from Mojo::DOM::CSS are supported.
# Find a specific element and extract information my $id = $dom->find('div')->[23]{id}; # Extract information from multiple elements my @headers = $dom->find('h1, h2, h3')->pluck('text')->each;
namespace
my $namespace = $dom->namespace;
Find element namespace.
# Find namespace for an element with namespace prefix my $namespace = $dom->at('svg > svg\:circle')->namespace; # Find namespace for an element that may or may not have a namespace prefix my $namespace = $dom->at('svg > circle')->namespace;
parent
my $parent = $dom->parent;
Parent of element.
parse
$dom = $dom->parse('<foo bar="baz">test</foo>');
Alias for "parse" in Mojo::DOM::HTML.
# Parse UTF-8 encoded XML my $dom = Mojo::DOM->new->charset('UTF-8')->xml(1)->parse($xml);
prepend
$dom = $dom->prepend('<p>Hi!</p>');
Prepend to element.
# "<div><h1>A</h1><h2>B</h2></div>" $dom->parse('<div><h2>B</h2></div>')->at('h2')->prepend('<h1>A</h1>')->root;
prepend_content
$dom = $dom->prepend_content('<p>Hi!</p>');
Prepend to element content.
# "<div><h2>AB</h2></div>" $dom->parse('<div><h2>B</h2></div>')->at('h2')->prepend_content('A')->root;
replace
my $old = $dom->replace('<div>test</div>');
Replace element.
# "<div><h2>B</h2></div>" $dom->parse('<div><h1>A</h1></div>')->at('h1')->replace('<h2>B</h2>')->root; # "<div></div>" $dom->parse('<div><h1>A</h1></div>')->at('h1')->replace('')->root;
replace_content
$dom = $dom->replace_content('test');
Replace element content.
# "<div><h1>B</h1></div>" $dom->parse('<div><h1>A</h1></div>')->at('h1')->replace_content('B')->root; # "<div><h1></h1></div>" $dom->parse('<div><h1>A</h1></div>')->at('h1')->replace_content('')->root;
root
my $root = $dom->root;
Find root node.
text
my $trimmed = $dom->text; my $untrimmed = $dom->text(0);
Extract text content from element only (not including child elements), smart whitespace trimming is enabled by default.
# "foo baz" $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->div->text; # "foo\nbaz\n" $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->div->text(0);
text_after
my $trimmed = $dom->text_after; my $untrimmed = $dom->text_after(0);
Extract text content immediately following element, smart whitespace trimming is enabled by default.
# "baz" $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->div->p->text_after; # "baz\n" $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->div->p->text_after(0);
text_before
my $trimmed = $dom->text_before; my $untrimmed = $dom->text_before(0);
Extract text content immediately preceding element, smart whitespace trimming is enabled by default.
# "foo" $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->div->p->text_before; # "foo\n" $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->div->p->text_before(0);
to_xml
my $xml = $dom->to_xml;
Render this element and its content to XML. Note that the XML will be encoded if a charset has been defined.
# "<b>test</b>" $dom->parse('<div><b>test</b></div>')->div->b->to_xml;
tree
my $tree = $dom->tree; $dom = $dom->tree(['root', [qw(text lalala)]]);
Alias for "tree" in Mojo::DOM::HTML.
type
my $type = $dom->type; $dom = $dom->type('div');
Element type.
# List types of child elements say $dom->children->pluck('type');
my $xml = $dom->xml; $dom = $dom->xml(1);
Alias for "xml" in Mojo::DOM::HTML.
In addition to the methods above, many child elements are also automatically available as object methods, which return a Mojo::DOM or Mojo::Collection object, depending on number of children.
say $dom->p->text; say $dom->div->[23]->text; say $dom->div->pluck('text');
Direct hash reference access to element attributes is also possible.
say $dom->{foo}; say $dom->div->{id};
Mojolicious, Mojolicious::Guides, http://mojolicio.us.
To install Mojolicious, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Mojolicious
CPAN shell
perl -MCPAN -e shell install Mojolicious
For more information on module installation, please visit the detailed CPAN module installation guide.