XML::DocStats - produce a simple analysis of an XML document
Analyze the xml document on STDIN, the STDOUT output format is html:
use XML::DocStats; my $parse = XML::DocStats->new; $parse->analyze;
Analyze in-memory xml document:
use XML::DocStats; my ($xmldata) = @_; my $parse = XML::DocStats->new(xmlsource=>{String => $xmldata}, BYTES => length($xmldata)); $parse->analyze;
Analyze xml document IO stream, the output format is plain text:
use XML::DocStats; use IO::File; my $xmlsource = IO::File->new("< document.xml"); my $parse = XML::DocStats->new(xmlsource=>{ByteStream => $xmlsource}); $parse->format('text'); $parse->analyze;
XML::DocStats parses an xml document using a SAX handler built using Ken MacLeod's XML::Parser::PerlSAX. It produces a listing indented to show the element heirarchy, and collects counts of various xml components along the way. A summary of the counts is produced following the conclusion of the parse. This is useful to visualize the structure and content of an XML document.
The output listing is either in plain text or html.
Each xml thingy is color-coded in the html output for easy reading:
Create a XML::DocStats. Parameters to control the input, output, and analysis format can be passed to new, to analyse, or by invoking parameter methods. See below.
Parse the xml document and produce the analysis listing.
Parameters to control the input, output, and analysis format can be passed to new, to analyse, or by invoking the parameter methods listed below, e.g. $parse->param('value'). When passing parameters to new or analyse, the form $parse->analyze(param=>'value') is used.
xmlsource - values: the XML::Parser::PerlSAX Source, default: {ByteStream => \*STDIN}. See XML::Parser::PerlSAX.
format - values: html/text, default: html. When format is html, the analysis listing is formatted in HTML; otherwise, plain text is produced.
output - values: print/return, default: print. When outout is print, the analysis listing is printed to STDOUT incrementally as the parse progresses; otherwise, the listing is retured as a text string by analyze.
print_htmlpage - values: yes/no, default: yes. When print_htmlpage is yes and format is html, the analysis listing is formatted as a complete XHTML document. Otherwise, if format is html, only the HTML tags necessary to format the listing are included.
The following parameters control whether the corresponding xml thingy is included in the analysis listing. Setting all print_<item>s to no will produce just the summary statistics.
print_element - values: yes/no, default: yes.
print_text - values: yes/no, default: yes.
print_entity - values: yes/no, default: yes.
print_doctype - values: yes/no, default: yes.
print_xmldcl - values: yes/no, default: yes.
print_comment - values: yes/no, default: yes.
print_pi - values: yes/no, default: yes.
An example command line script, xmldocstats.pl is included in the eg directory of the distribution. After installation, you can put this script in your PATH and use it to analyze an xml document:
xmldocstats.pl mydoc.xml
or
xmldocstats.pl < mydoc.xml | less
My web site has an online example, see: "WEB SITE"
A working example of XML::DocStats can be found online at:
XML::Parser::PerlSAX, XML::Parser, Object::_Initializer.
To install XML::DocStats, copy and paste the appropriate command in to your terminal.
cpanm
cpanm XML::DocStats
CPAN shell
perl -MCPAN -e shell install XML::DocStats
For more information on module installation, please visit the detailed CPAN module installation guide.