HTML::Object::Element - HTML Element Object
use HTML::Object::Element; my $this = HTML::Object::Element->new || die( HTML::Object::Element->error, "\n" );
v0.2.0
This interface implement a core element for HTML::Object parser. An element can be one or more space, a text, a tag, a comment, or a document, all of the above inherit from this core interface.
For a more elaborate interface and a close implementation of the Web Document Object Model (a.k.a. DOM), see HTML::Object::DOM::Element and the DOM parser
This method is purely for compatibility with "address" in HTML::Element. Please, refer to its documentation for its use.
Returns an hash (not an hash reference) of the element's attributes as a key-value pairs.
This is provided in compatibility with HTML::Element
HTML::Element
my %attributes = $e->all_attr;
Returns a list of all the element's attributes in no particular order.
my @attributes = $e->all_attr_names;
This is an alias for "as_string"
Returns a string representation of the current element and its underlying descendants.
If a cached version of that string exists, it is returned instead.
Returns a string representation of the text content of the current element and its descendant.
Return the value returned by "as_text", only its leading and trailing spaces, if any, are trimmed.
This is merely an alias for as_string
Provided with an attribute name and this will return it. If an attribute value is also provided, it will set or replace the attribute valu accordingly. If that attribute value provided is undef, this will remove the attribute altogether.
name
value
undef
Returns an hash object of all the attributes key-value pairs.
Be careful this is a 'live' object, and if you make change to it directly, you could damage the hierarchy or introduce errors.
Returns an array object containing the attribute names in their order of appearance.
Returns the element checksum, used to determine if any change was made.
Returns an array object containing all the element's children.
Returns this element class, e.g. HTML::Object::Element or HTML::Object::Document
HTML::Object::Element
HTML::Object::Document
Returns a copy of the current element, and recursively all of its descendants,
The cloned element, that is returned, has no parent.
Clone all the element children and return a new array object of the cloned children.
This is quite different from HTML::Element equivalent that is accessed as a class method and takes an arbitrary list of elements.
Close the current tag, if necessary. It returns the current object upon success, or undef upon error and sets an error
Set or get a closing element object that is used to close the current element.
Returns the column at which this element was found in the original HTML text string, by the parser.
This is an alias for "children". It returns an array object of the current element's children objects.
In list context, this returns the list of the curent element's children, if any, and in scalar context, this returns the number of children elements it contains.
Remove all of its content by calling "delete_content", detach the current object, and destroy the object.
Remove the content, i.e. all the children, of the current element, effectively calling "delete" on each one of them.
It returns the current element.
Does not do anything by design. There is no much value into this method under HTML::Object in the first place.
Returns an integer representing the depth level of the current element in the hierarchy.
Returns an array object of all the element's descendants throughout its hierarchy.
An alias for "delete"
An alias for "delete_content"
This method takes no parameter and removes the current element from its parent's list of children element, and unset its parent object value.
It returns the element parent object.
This method takes no argument and will remove the parent value for each of its children, set the children list for the current element to an empty list and return the list of those children elements thus removed.
my @removed = $e->detach_content;
Print out on the stdout a representation of the hierarchy of element objects.
Returns the element unique id, which is automatically generated for any element. This is actually a uuid. For example:
my $eid = $e->eid; # e.g.: 971ef725-e99b-4869-b6ac-b245794e84e2
Returns the current object.
Actually, I am not sure this should be here, and rather it should be in HTML::Object::XQuery since it simulates jQuery.
Returns an array object containing hash objects, for each attribute of an element containing a link, with the following properties:
Returns an array object of all the elements (including potentially the current element itself) in the element's hierarchy who have an attribute that matches the given attribute name.
my $list = $e->find_by_attribute( 'data-dob' );
Returns an array object of all the elements (including potentially the current element itself) in the element's hierarchy who matches any of the specified tag names. Tag names can be provided n case insensitive.
my $list = $e->find_by_tag_name( qw( div p span ) );
Returns true if the current element has children, i.e. it contains other elements within itself.
Set or get the id HTML attribute of the element.
Provided with an element object and this will add it to the current element's children.
It returns the current element object.
Returns the internal hash of key-value paris used internally by this package. This is primarily used to handle the data-* special attributes.
data-*
Returns true if the current element has a closing tag that is accessible with "close_tag"
Returns true if this is an element who, by HTML standard, does not contain any other elements, and false otherwise.
To check if the element has children, use "has_children"
Provided with a list of tag names or element objects, and this will check if the current element is contained in any of the element objects, or elements whose tag name is provided. It returns true if it is contained, or false otherwise.
Example:
say $e->is_inside( qw( span div ), $elem1, 'p', $elem2 ) ? 'yes' : 'no';
Provided with an attribute name and this returns true if it is valid of false otherwise.
Returns true if, by standard, this tag is void, meaning it does not contain any children. For example: <br />, <link />, or <input />
<br /
<link /
<input /
Returns an array object of all the sibling objects before the current element.
Returns the line at which this element was found in the original HTML text string, by the parser.
Returns an array object of the current element's parent and parent's parent up to the root of the hierarchy
Returns an array object of the current element's parent tag name and parent's parent tag name up to the root of the hierarchy
This is equivalent to:
my $list = $self->lineage->map(sub{ $_->tag });
This is the method that does the heavy work for "look_down" and "look_up"
Provided with some criterias, and an optional hash reference of options, and this will crawl down the current element hierarchy to find any matching element.
my $list = $e->look_down( _tag => 'div' ); # returns an Module::Generic::Array object my $list = $e->look_down( class => qr/\bclass_name\b/, { max_level => 3, max_match => 1 });
The options you can specify are:
Takes an integer that sets the maximum lower or upper level beyond which, this wil stop searching.
Takes an integer that sets the maximum number of matches after which, this will stop recurring and return the result.
There are three kinds of criteria you can specify:
attr_name
attr_value
This is used when you are looking for an element with a particular attribute name and value. For example:
my $list = $e->look_down( id => 'hello' );
This will look for any element whose attribute id has a value of hello
id
hello
To search for a tag, use the special attribute _tag. For example:
_tag
my $list = $e->look_down( _tag => 'div' );
This will return an array object of all the div elements.
div
Same as above, except the attribute value of the element being checked will be evaluated against this regular expression and if true will be added into the resulting array object.
For example:
my $list = $e->look_down( 'data-dob' => qr/^\d{4}-\d{2}-\d{2}$/ );
This will search for all element who have an attribute data-dob and with value something that looks like a date.
data-dob
Provided with a code reference (i.e. a reference to an existing subroutine, or an anonymous one), and it will be evaluated for each element found. If it returns undef, look_down will interrupt its crawling, and if it returns true, it will signal the need to add the element to the resulting array object of elements.
look_down
my $list = $e->look_down( _tag => 'img', class => qr/\bactive\b/, sub { return( $_->attr( 'width' ) > 350 ? 1 : 0 ); } );
When executing the code, the current element being evaluated will be made available via $_
$_
Those criteria are called and evaluated in the order they are provided. Thus, if you specify, for example:
Each element will be evaluated first to see if their tag is img and discarded if they are not. Then, if they have a class attribute and its content match the regular expression provided, and the element gets discarded if it does not match. Finally, the code will be evaluated.
img
Thus, the order of the criteria is important.
It returns an array object of all the elements found.
This is provided as a compatibility with HTML::Element
Provided with some criterias, and an optional hash reference of options, and this will crawl up the current element ascendants starting with its parent to find any matching element.
The options that can be used are the same ones that for "look_down", i.e. max_level and max_match
max_level
max_match
Provided with a string and this returns true if the string starts with an HTML tag, or false otherwise.
Provided with a string and this returns true if the string contains HTML tags, or false otherwise.
Set or get a boolean of whether the element was modified. Actually this is not used.
This returns a DateTime object.
This creates a new HTML::Object::Attribute object passing it any arguments provided, and returns the object thus created, or undef if an error occurred.
This creates a new HTML::Object::Closing object passing it any arguments provided, and returns the object thus created, or undef if an error occurred.
Instantiate a new HTML document, passing it whatever argument was provided, and return the resulting object.
Instantiate a new element, passing it whatever argument was provided, and return the resulting object.
This is a legacy from HTML::Element, but is not actually used.
This recursively constructs a tree of nodes.
It returns an array object of elements.
Instantiate a new parser object, passing it whatever argument was provided, and return the resulting object.
Instantiate a new text object, passing it whatever argument was provided, and return the resulting object.
Check each of the current element child element and concatenate any adjacent text or space element.
It returns the current object.
Returns the offset value, i.e. the byte position, at which the tag was found in the original HTML data.
Returns the original raw string data as it was captured initially by the parser.
This is an important feature of HTML::Object since that, if nothing was changed, HTML::Object will return the element objects in their original text version.
original
Whereas, other HTML parser, decode all the HTML elements parsed and rebuild them, often badly and even though they have not been changed, which of course, incur a heavy speed penalty.
Returns the current element's parent element, if any. The value returned could very well be empty if, for example, it is the top element or if the element was created independently of any parsing.
This is an alias for "pos"
Read-only.
Returns the position integer of the current element among its parent's children elements.
It returns a smart undef if the element has no parent.
If the current element, somehow, could not be found among its parent, this would return undef
Contrary to the HTML::Element equivalent, you cannot manually change this value.
Provided with a list of elements and this will add them right after the current element in its parent's children.
It returns the current element object for chaining upon success, and upon error, it returns undef and sets an error
Provided with a list of elements and this will add them right before the current element in its parent's children.
Provided with a list of elements and this will add them as children to the current element.
Contrary to the HTML::Element equivalent, this requires that only object be provided, which is easy to do anyhow.
If consecutive text or space objects are provided they are automatically merged with their immediate text or space objects, if any.
$e->push_content( $elem1, HTML::Object::Element->new( value => q{some text} ), $elem2 );
And if two consecutive text objects were provided the second one would have its value merged with the previous one.
It returns the current element object for chaining.
Provided with a list of element objects and this will replace the current element in its parent's children with the element objects provided.
This will return an error if the current element has no parent, or if the current element cannot be found among its parent's children elements.
Also, this method will filter out any duplicate objects, and return an error if the element being replaced is also among the objects provided for replacement or if the current element's parent is among the replacement objects.
Each replacement object is detached from its previous parent and re-attach to the current element's parent before being added to its children.
Replaces the current element in its parent's children by its own children element, which, in other words, means that the current element children will be moved up and replace the current element itself.
It returns the current element object, which will then, have no more parent.
Enable the reset flag for this element, which has the effect of instructing "as_string" to not use its cache.
Returns an array object of all the sibling objects after the current element.
Returns the top most element in the hierarchy, which usually is HTML::Object::Document
This method will check that 2 element objects are similar, in the sense that they can have different "eid", but have identical structure.
I you want to check if 2 element object are actually the same, by comparing their eid, you can use the comparison signs that have been overloaded. For example:
eid
say $a eq $b ? 'same' : 'nope';
Calculate and returns the md5 checksum of the current element based on all its attributes.
Provided with an offset and a length, and a list of element objects and this will replace the elements children at offset position offset and for a length number of items by the list of objects supplied.
offset
length
If consecutive text element or space element are provided they will be merged with their immediate previous sibling of the same type.
$e->splice_content( 3, 2, $elem1, $elem2, HTML::Object::Text->new( value => 'Hello world' ) );
It returns an error if the offset or length provided is not a valid integer.
Upon success, it returns the current object for chaining.
Returns the tag name of the current element as a scalar object. Be careful at any change you would make as it would directly change the element tag name.
Non-element tag, such as text or space have a pseudo tag starting with an underscore ("_"), such as _text and _space
_text
_space
Provided with a reference to an existing subroutine, or an anonymous one, and this will crawl through every element of the descending hierarchy and call the callback code, passing it the element object being evaluated. The local variable $_ is also made available and set to the element being evaluated.
This acts like "push_content", except that instead of appending the elements, this prepends the given element on top of the element children.
Jacques Deguest <jack@deguest.jp>
HTML::Object, HTML::Object::Attribute, HTML::Object::Boolean, HTML::Object::Closing, HTML::Object::Collection, HTML::Object::Comment, HTML::Object::Declaration, HTML::Object::Document, HTML::Object::Element, HTML::Object::Exception, HTML::Object::Literal, HTML::Object::Number, HTML::Object::Root, HTML::Object::Space, HTML::Object::Text, HTML::Object::XQuery
Mozilla Element documentation
Copyright (c) 2021 DEGUEST Pte. Ltd.
All rights reserved
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
To install HTML::Object, copy and paste the appropriate command in to your terminal.
cpanm
cpanm HTML::Object
CPAN shell
perl -MCPAN -e shell install HTML::Object
For more information on module installation, please visit the detailed CPAN module installation guide.