Thibault Duponchelle
and 2 contributors

NAME

XML::Minifier - A configurable XML minifier.

WARNING

The API (option names) is almost stabilized (but not fully) and can therefore still change a bit.

SYNOPSIS

Here is the simplest way to use XML::Minifier :

    use XML::Minifier;

    my $maxi = "<person>   <name>tib   </name>   <level>  42  </level>  <city>   </city>  </person>";
    my $mini = minify($maxi);

But a typical use would include some parameters like this :

    use XML::Minifier qw(minify);

    my $maxi = "<person>   <name>tib   </name>   <level>  42  </level>  <city>   </city>  </person>";
    my $mini = minify($maxi, no_prolog => 1, aggressive => 1);

That will produce :

    <person><name>tib</name><level>42</level><city/></person>

aggressive, destructive and insane are shortcuts that define a set of parameters.

You can set indivually with :

    use XML::Minifier qw(minify);

    my $maxi = "<person>   <name>tib   </name>   <level>  42  </level>  <city>   </city>  </person>";
    my $mini = minify($maxi, no_prolog => 1, aggressive => 1, keep_comments => 1, remove_indent => 1);

The code above means "minify this string with aggressive mode BUT keep comments and in addition remove indent".

Not every parameter has a keep_ neither a remove_, please see below for detailed list.

DEFAULT MINIFICATION

The minifier has a predefined set of options enabled by default.

They were decided by the author as relevant but you can disable individually with keep_ options.

Merge elements when empty
Remove DTD (configurable).
Remove processing instructions (configurable)
Remove comments (configurable).
Remove CDATA (configurable).

In addition, the minifier will drop every blanks between the first level children. What you can find between first level children is not supposed to be meaningful data then we we can safely remove formatting here. For instance we can remove a carriage return between prolog and a processing instruction (or even inside a DTD).

In addition again, the minifier will smartly remove blanks between tags. By smart I mean that it will not remove blanks if we are in a leaf (more chances to be meaningful blanks) or if the node contains something that will persist (a not removed comment/cdata/PI, or a piece of text not empty). The meaningfulness of blanks can be given by a DTD. Then if a DTD is present and *protects some nodes*, we oviously respect this. But you can decide to change this behaviour with **ignore_dtd** option.

If there is no DTD (very often), we are blind and simply use the approach I just described above (keep blanks in leafs, remove blanks in nodes if all siblings contains only blanks).

Everything listed above is the default and should be perceived as almost lossyless minification in term of semantic (for humans).

It's not completely if you consider these things as data, but in this case you simply can't minify as you can't touch anything ;)

EXTRA MINIFICATION

In addition, you could enable mode aggressive, destructive or insane to remove characters in the text nodes (sort of "cleaning") :

Aggressive

Remove empty text nodes.
Remove starting blanks (carriage return, line feed, spaces...).
Remove ending blanks (carriage return, line feed, spaces...).

Destructive

Remove indentation.
Remove invisible spaces and tabs at the end of line.

Insane

Remove carriage returns and line feed into text nodes everywhere.
Remove spaces into text nodes everywhere.

OPTIONS

You can give various options:

expand_entities

Expand entities. An entity is like

    &foo; 
process_xincludes

Process the xincludes. An xinclude is like

    <xi:include href="inc.xml"/>
remove_blanks_start

Remove blanks (spaces, carriage return, line feed...) in front of text nodes.

For instance

    <tag>    foo bar</tag> 

will become

    <tag>foo bar</tag>

It is aggressive and therefore lossy compression.

remove_blanks_end

Remove blanks (spaces, carriage return, line feed...) at the end of text nodes.

For instance

    <tag>foo bar    
       </tag> 

will become

    <tag>foo bar</tag>

It is aggressive and therefore lossy compression.

remove_spaces_line_start or remove_indent

Remove spaces and tabs at the start of each line in text nodes. It's like removing indentation actually.

For instance

    <tag>
           foo 
           bar    
       </tag> 

will become

    <tag>
    foo 
    bar
    </tag>
remove_spaces_line_end

Remove spaces and tabs at the end of each line in text nodes. It's like removing invisible things.

remove_empty_text

Remove (pseudo) empty text nodes (containing only spaces, carriage return, line feed...).

For instance

    <tag>

    </tag>

will become

    <tag/>
remove_cr_lf_everywhere

Remove carriage returns and line feed everywhere (inside text !).

For instance

    <tag>foo
    bar
    </tag> 

will become

    <tag>foobar</tag>

It is aggressive and therefore lossy compression.

keep_comments

Keep comments, by default they are removed.

A comment is something like :

    <!-- comment -->
keep_cdata

Keep cdata, by default they are removed.

A CDATA is something like :

    <![CDATA[ my cdata ]]>
keep_pi

Keep processing instructions.

A processing instruction is something like :

    <?xml-stylesheet href="style.css"/>
keep_dtd

Keep DTD.

ignore_dtd

When set, the minifier will ignore informations from the DTD (typically where blanks are meaningfull)

This option can be used with keep_dtd, you can decide to get informations from DTD then remove it (or the contrary).

Then I must repeat that ignore_dtd is NOT the contrary of keep_dtd

no_prolog

Do not put prolog (having no prolog is aggressive for XML readers).

Prolog is at the start of the XML file and look like this :

    <?xml version="1.0" encoding="UTF-8"?>
version

Specify version.

encoding

Specify encoding.

aggressive

Enable aggressive mode. Enables options remove_blanks_starts, remove_blanks_end and remove_empty_text if they are not defined only. Other options still keep their value.

destructive

Enable destructive mode. Enable options remove_spaces_line_starts and remove_spaces_line_end if they are not defined only. Enable also aggressive mode. Other options still keep their value.

insane

Enable insane mode. Enables options remove_cr_lf_everywhere and remove_spaces_everywhere if they are not defined only. Enable also destructive mode and aggressive mode. Other options still keep their value.

LICENSE

Copyright (C) Thibault DUPONCHELLE.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

AUTHOR

Thibault DUPONCHELLE