The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.


XML::Smart - A smart, easy and powerful way to access/create XML files/data.


This module has an easy way to access/create XML data. It's based on the HASH tree that is made of the XML data, and enable a dynamic access to it with the Perl syntax for Hashe and Array, without needing to care if you have a Hashe or an Array in the tree. In other words, each point in the tree work as a Hash and an Array at the same time!

You also have extra resources, like a search for nodes by attribute, selection of an attribute value in each multiple node, change the returned format, etc...

The module alson handle automatically binary data (encoding/decoding to/from base64), CDATA (like contents with <tags>) and Unicode. It can be used to create XML files, load XML from the Web (just pasting an URL as a file path) and it has an easy way to send XML data through socket, just adding the length of the data in the <?xml?> header.

You can use XML::Smart with XML::Parser, or with the 2 standart parsers of XML::Smart:


XML::Smart::HTMLParser can be used to load/parse wild/bad XML data, or HTML tags.


  ## Create the object and load the file:
  my $XML = XML::Smart->new('file.xml') ;
  ## Force the use of the parser 'XML::Smart::Parser'.
  my $XML = XML::Smart->new('file.xml' , 'XML::Smart::Parser') ;
  ## Get from the web:
  my $XML = XML::Smart->new('') ;

  ## Cut the root:
  $XML = $XML->cut_root ;

  ## Or change the root:
  $XML = $XML->{hosts} ;

  ## Get the address [0] of server [0]:
  my $srv0_addr0 = $XML->{server}[0]{address}[0] ;
  ## ...or...
  my $srv0_addr0 = $XML->{server}{address} ;
  ## Get the server where the attibute 'type' eq 'suse':
  my $server = $XML->{server}('type','eq','suse') ;
  ## Get the address again:
  my $addr1 = $server->{address}[1] ;
  ## ...or...
  my $addr1 = $XML->{server}('type','eq','suse'){address}[1] ;
  ## Get all the addresses of a server:
  my @addrs = @{$XML->{server}{address}} ;
  ## ...or...
  my @addrs = $XML->{server}{address}('@') ;
  ## Get a list of types of all the servers:
  my @types = $XML->{server}('[@]','type') ;
  ## Add a new server node:
  my $newsrv = {
  os      => 'Linux' ,
  type    => 'Mandrake' ,
  version => 8.9 ,
  address => [qw(]
  } ;
  push(@{$XML->{server}} , $newsrv) ;

  ## Get/rebuild the XML data:
  my $xmldata = $XML->data ;
  ## Save in some file:
  $XML->save('newfile.xml') ;
  ## Send through a socket:
  print $socket $XML->data(length => 1) ; ## show the 'length' in the XML header to the
                                          ## socket know the amount of data to read.
  <?xml version="1.0" encoding="iso-8859-1"?>
    <server os="linux" type="redhat" version="8.0">
    <server os="linux" type="suse" version="7.0">
    <server address="" os="linux" type="conectiva" version="9.0"/>



Create a XML object.



The first argument can be:

  - XML data as string.
  - File path.
  - File Handle (GLOB).
  - URL (Need LWP::UserAgent).

If not paste, a null XML tree is started, where you should create your own XML data, than build/save/send it.

PARSER (optional)

Set the XML parser to use. Options:


XML::Smart::Parser can only handle basic XML data (not supported PCDATA, and any header like: ENTITY, NOTATION, etc...), but is a good choice when you don't want to install big modules to parse XML, since it comes with the main module. But it still can handle CDATA and binary data.

** See "PARSING HTML as XML" for XML::Smart::HTMLParser.

Aliases for the options:

  SMART|REGEXP   => XML::Smart::Parser
  HTML           => XML::Smart::HTMLParser


If not set it will look for XML::Parser and load it. If XML::Parser can't be loaded it will use XML::Smart::Parser, that actually is a clone of XML::Parser::Lite with some fixes.


Return a copy of the XML::Smart object (pointing to the base).

** This is good when you want to keep 2 versions of the same XML tree in the memory, since one object can't change the tree of the other!


Get back to the base of the tree.

Each query to the XML::Smart object return an object pointing to a different place in the tree (and share the same HASH tree). So, you can get the main object again (an object that points to the base):

  my $srv = $XML->{root}{host}{server} ;
  my $addr = $srv->{adress} ;
  my $XML2 = $srv->base() ;


Get back one level the pointer in the tree.

** Se base().


Cut the root key:

  my $srv = $XML->{rootx}{host}{server} ;
  ## Or if you don't know the root name:
  $XML = $XML->cut_root() ;
  my $srv = $XML->{host}{server} ;

** Note that this will cut the root of the pointer in the tree. So, if you are in some place that have more than one key (multiple roots), the same object will be retuned without cut anything.


Return the content of a node:

  ## Data:
  <foo>my content</foo>
  ## Access:
  my $content = $XML->{foo}->content ;
  print "<<$content>>\n" ; ## show: <<my content>>
  ## or just:
  my $content = $XML->{foo} ;


Return the HASH tree of the XML data.

** Note that the real HASH tree is returned here. All the other ways return an object that works like a HASH/ARRAY through tie.

data (OPTIONS)

Return the data of the XML object (rebuilding it).



If set to true the data isn't idented.


If set to true the data isn't idented and doesn't have space between the tags (unless the CONTENT have).


Make the tags lower case.


Make the arguments lower case.


Make the tags uper case.


Make the arguments uper case.


If set true, add the attribute 'length' with the size of the data to the xml header (<?xml ...?>). This is useful when you send the data through a socket, since the socket can know the total amount of data to read.


Do not add the <?xml ...?> header.


Do not add the meta generator tag: <?meta generator="XML::Smart" ?>


Set the meta tags of the XML document.


    my $meta = {
    build_from => "wxWindows 2.4.0" ,
    file => "wx26.htm" ,
    } ;
    print $XML->data( meta => $meta ) ;
    <?meta build_from="wxWindows 2.4.0" file="wx283.htm" ?>

Multiple meta:

    my $meta = [
    {build_from => "wxWindows 2.4.0" , file => "wx26.htm" } ,
    {script => "" , ver => "1.0" } ,
    ] ;
    <?meta build_from="wxWindows 2.4.0" file="wx26.htm" ?>
    <?meta script="" ver="1.0" ?>

Or set directly the meta tag:

    my $meta = '<?meta foo="bar" ?>' ;

    ## For multiple:
    my $meta = ['<?meta foo="bar" ?>' , '<?meta x="1" ?>'] ;
    print $XML->data( meta => $meta ) ;

Set the HASH tree to parse. If not set will use the tree of the XML::Smart object (tree()). ;

data_pointer (OPTIONS)

Make the tree from current point in the XML tree (not from the base as data()).

Accept the same OPTIONS of the method data().


Save the XML data inside a file.

Accept the same OPTIONS of the method data().


To access the data you use the object in a way similar to HASH and ARRAY:

  my $XML = XML::Smart->new('file.xml') ;
  my $server = $XML->{server} ;

But when you get a key {server}, you are actually accessing the data through tie(), not directly to the HASH tree inside the object, (This will fix wrong accesses):

  ## {server} is a normal key, not an ARRAY ref:

  my $server = $XML->{server}[0] ; ## return $XML->{server}
  my $server = $XML->{server}[1] ; ## return UNDEF
  ## {server} has an ARRAY with 2 items:

  my $server = $XML->{server} ;    ## return $XML->{server}[0]
  my $server = $XML->{server}[0] ; ## return $XML->{server}[0]
  my $server = $XML->{server}[1] ; ## return $XML->{server}[1]

To get all the values of a multiple attribute/key:

  ## This work having only a string inside {address}, or with an ARRAY ref:
  my @addrsses = @{$XML->{server}{address}} ;

When you don't know the position of the nodes, you can select it by some attribute value:

  my $server = $XML->{server}('type','eq','suse') ; ## return $XML->{server}[1]

Syntax for the select search:


The attribute name in the node (tag).


Can be

  eq  ne  ==  !=  <=  >=  <  >


  =~  !~
  ## Case insensitive:
  =~i !~i

The value.

For REGEX use like this:

  $XML->{server}('type','=~','^s\w+$') ;

Select attributes in multiple nodes:

You can get the list of values of an attribute looking in all multiple nodes:

  ## Get all the server types:
  my @types = $XML->{server}('[@]','type') ;

Also as:

  my @types = $XML->{server}{type}('<@') ;

Without the resource:

  my @list ;
  my @servers = @{$XML->{server}} ;
  foreach my $servers_i ( @servers ) {
    push(@list , $servers_i->{type} ) ;

Return format

You can change the returned format:



Where TYPE can be:

  $  ## the content
  @  ## an array (list of multiple values)
  %  ## a hash
  $@  ## an array, but with the content, not an objects.
  $%  ## a hash, but the values are the content, not an object.
  ## The use of $@ and $% is good if you don't want to keep the object
  ## reference (and save memory).
  @keys  ## The keys of the node. note that if you have a key with
         ## multiple nodes, it will be replicated (this is the
         ## difference of "keys %{$this->{node}}" ).

  <@ ## Return the attribute in the previous node, but looking for
     ## multiple nodes. Example:
  my @names = $this->{method}{wxFrame}{arg}{name}('<@') ;
  #### @names = (parent , id , title) ;
    <wxFrame return="wxFrame">
      <arg name="parent" type="wxWindow" /> 
      <arg name="id" type="wxWindowID" /> 
      <arg name="title" type="wxString" /> 


  ## All the servers
  my $name = $XML->{server}{name}('$') ;
  ## ... or:
  my $name = $XML->{server}{name}->content ;
  ## ... or:
  my $name = $XML->{server}{name} ;
  $name = "$name" ;
  ## All the servers
  my @servers = $XML->{server}('@') ;
  ## ... or:
  my @servers = @{$XML->{server}} ;
  ## It still has the object reference:
  @servers[0]->{name} ;
  ## Without the reference:
  my @servers = $XML->{server}('$@') ;


If a {key} has a content you can access it directly from the variable or from the method:

  my $server = $XML->{server} ;

  print "Content: $server\n" ;
  ## ...or...
  print "Content: ". $server->content ."\n" ;

So, if you use the object as a string it works as a string, if you use as an object it works as an object! ;-P


To create XML data is easy, you just use as a normal HASH, but you don't need to care with multiple nodes, and ARRAY creation/convertion!

  ## Create a null XML object:
  my $XML = XML::Smart->new() ;
  ## Add a server to the list:
  $XML->{server} = {
  os => 'Linux' ,
  type => 'mandrake' ,
  version => 8.9 ,
  address => '' ,
  } ;
  ## The data now:
  <server address="" os="Linux" type="mandrake" version="8.9"/>
  ## Add a new address to the server. Have an ARRAY creation, convertion
  ## of the previous key to ARRAY:
  $XML->{server}{address}[1] = '' ;
  ## The data now:
  <server os="Linux" type="mandrake" version="8.9">
  </server>ok 19

After create your XML tree you just save it or get the data:

  ## Get the data:
  my $data = $XML->data ;
  ## Or save it directly:
  $XML->save('newfile.xml') ;
  ## Or send to a socket:
  print $socket $XML->data(length => 1) ;


From version 1.2 XML::Smart can handle binary data and CDATA blocks automatically.

When parsing, binary data will be detected as:

  <code dt:dt="binary.base64">f1NPTUUgQklOQVJZIERBVEE=</code>

Since this is the oficial automatically format for binary data at The content will be decoded from base64 and saved in the object tree.

CDATA will be parsed as any other content, since CDATA is only a block that won't be parsed.

When creating XML data, like at $XML->data(), the binary format and CDATA are detected using this roles:

  - If have characters that can't be in XML.

  * Characters accepted:
    \s \w \d
  - If have tags: <...>
  CONTENT: (<tag>content</tag>)
  - If have \r\n\t, or ' and " at the same time.

So, this will be a CDATA content:


If a binary content is detected, it will be converted to base64 and a dt:dt attribute added in the tag to tell the format.

  <code dt:dt="binary.base64">f1NPTUUgQklOQVJZIERBVEE=</code>

UNICODE and ASCII-extended (ISO-8859-1)

XML::Smart support only this 2 encode types, Unicode (UTF-8) and ASCII-extended (ISO-8859-1), and must be enough. (Note that UTF-8 is only supported on Perl-5.8+).

When creating XML data, if any UTF-8 character is detected the encoding attribute in the <?xml ...?> header will be set to UTF-8:

  <?xml version="1.0" encoding="utf-8" ?>

If not, the iso-8859-1 is used:

  <?xml version="1.0" encoding="iso-8859-1" ?>

When loading XML data with UTF-8, Perl (5.8+) should make all the work internally.


You can use the special parser XML::Smart::HTMLParser to "use" HTML as XML or not well-formed XML data.

The differences between an normal XML parser and XML::Smart::HTMLParser are:

  - Accept values without quotes:
    <foo bar=x>
  - Accept any data in the values, including <> and &:
    <root><echo sample="echo \"Hello!\">out.txt"></root>
  - Accpet URI values without quotes:
    <link url= target=#_blank>
  - Don't need to close the tags adding the '/' before '>':
    <root><foo bar="1"></root>
    ** Note that the parse will try hard to detect the nodes, and where
       auto-close or not.
  - Don't need to have only one root:

So, XML::Smart::HTMLParser is a willd way to load markuped data (like HTML), or if you don't want to care with quotes, end tags, etc... when writing by hand your XML data. So, you can write by hand a bad XML file, load it with XML::Smart::HTMLParser, and rewrite well saving it again! ;-P

** Note that <SCRIPT> tags will only by parse right if the content is inside comments <!--...-->, since they can have tags:

  <SCRIPT LANGUAGE="JavaScript"><!--
  document.writeln("some <tag> in the string");


Entities (ENTITY) are handled by the parser. So, if you use XML::Parser it will do all the job fine. But If you use XML::Smart::Parser or XML::Smart::HMLParser, only the basic entities (defaults) will be parsed:

  &lt;   => The less than sign (<).
  &gt;   => The greater than sign (>).
  &amp;  => The ampersand (&).
  &apos; => The single quote or apostrophe (').
  &quot; => The double quote (").
  &#ddd;  => An ASCII character or an Unicode character (>255). Where ddd is a decimal.
  &#xHHH; => An Unicode character. Where HHH is in hexadecimal.

When creating XML data, already existent Entities won't be changed, and the characters '<', '&' and '>' will be converted to the appropriated entity.

** Note that if a content have a <tag>, the characters '<' and '>' won't be converted to entities, and this content will be inside a CDATA block.


Every one that have tried to use Perl HASH and ARRAY to access XML data, like in XML::Simple, have some problems to add new nodes, or to access the node when the user doesn't know if it's inside an ARRAY, a HASH or a HASH key. XML::Smart create around it a very dynamic way to access the data, since at the same time any node/point in the tree can be a HASH and an ARRAY. You also have other extra resources, like a search for nodes by attribute:

  my $server = $XML->{server}('type','eq','suse') ; ## This syntax is not wrong! ;-)

  ## Instead of:
  my $server = $XML->{server}[1] ;
    <server os="linux" type="redhat" version="8.0">
    <server os="linux" type="suse" version="7.0">

The idea for this module, came from the problem that exists to access a complex struture in XML. You just need to know how is this structure, something that is generally made looking the XML file (what is wrong). But in the same time is hard to always check (by code) the struture, before access it. XML is a good and easy format to declare your data, but to extrac it in a tree way, at least in my opinion, isn't easy. To fix that, came to my mind a way to access the data with some query language, like SQL. The first idea was to access using something like:{arg1}

  X =*

And saw that this is very similar to Hashes and Arrays in Perl:

  $XML->{foo}{bar}{baz}{arg1} ;
  $X = $XML->{foo}{bar} ;
  $X->{baz}{arg1} ;
  $XML->{hosts}{server}[0]{argx} ;

But the problem of Hash and Array, is not knowing when you have an Array reference or not. For example, in XML::Simple:

  ## This is very diffenrent
  $XML->{server}{address} ;
  ## ... of this:
  $XML->{server}{address}[0] ;

So, why don't make both ways work? Because you need to make something crazy!

To create XML::Smart, first I have created the module Object::MultiType. With it you can have an object that works at the same time as a HASH, ARRAY, SCALAR, CODE & GLOB. So you can do things like this with the same object:

  $obj = Object::MultiType->new() ;
  $obj->{key} ;
  $obj->[0] ;
  $obj->method ;  
  @l = @{$obj} ;
  %h = %{$obj} ;
  &$obj(args) ;
  print $obj "send data\n" ;

Seems to be crazy, and can be more if you use tie() inside it, and this is what XML::Smart does.

For XML::Smart, the access in the Hash and Array way paste through tie(). In other words, you have a tied HASH and tied ARRAY inside it. This tied Hash and Array work together, soo you can access a Hash key as the index 0 of an Array, or access an index 0 as the Hash key:

  %hash = (
  key => ['a','b','c']
  ) ;
  $hash->{key}    ## return $hash{key}[0]
  $hash->{key}[0] ## return $hash{key}[0]  
  $hash->{key}[1] ## return $hash{key}[1]
  ## Inverse:
  %hash = ( key => 'a' ) ;
  $hash->{key}    ## return $hash{key}
  $hash->{key}[0] ## return $hash{key}
  $hash->{key}[1] ## return undef

The best thing of this new resource is to avoid wrong access to the data and warnings when you try to access a Hash having an Array (and the inverse). Thing that generally make the script die().

Once having an easy access to the data, you can use the same resource to create data! For example:

  ## Previous data:
    <server address="" os="linux" type="conectiva" version="9.0"/>
  ## Now you have {address} as a normal key with a string inside:
  ## And to add a new address, the key {address} need to be an ARRAY ref!
  ## So, XML::Smart make the convertion: ;-P
  $XML->{hosts}{server}{address}[1] = '' ;
  ## Adding to a list that you don't know the size:
  push(@{$XML->{hosts}{server}{address}} , '') ;
  ## The data now:
    <server os="linux" type="conectiva" version="9.0"/>

Than after changing your XML tree using the Hash and Array resources you just get the data remade (through the Hash tree inside the object):

  my $xmldata = $XML->data ;

But note that XML::Smart always return an object! Even when you get a final key. So this actually returns another object, pointhing (inside it) to the key:

  $addr = $XML->{hosts}{server}{address}[0] ;
  ## Since $addr is an object you can TRY to access more data:
  $addr->{foo}{bar} ; ## This doens't make warnings! just return UNDEF.

  ## But you can use it like a normal SCALAR too:

  print "$addr\n" ;

  $addr .= ':80' ; ## After this $addr isn't an object any more, just a SCALAR!


XML::Parser, XML::Parser::Lite, XML.

Object::MultiType - This is the module that make everything possible, and was created specially for XML::Smart. ;-P


Graciliano M. P. <>

I will appreciate any type of feedback (include your opinions and/or suggestions). ;-P

Before make this module I dislike to use XML, and made everything to avoid it. Now I can use XML fine! ;-P


This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 1889:

Non-ASCII character seen before =encoding in '€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿'. Assuming CP1252