Data::Edit::Xml::Lint - lint xml files in parallel using xmllint and report the failure rate
Create some sample xml files, some with errors, lint them in parallel and retrieve the number of errors and failing files:
for my $n(1..$N) # Some projects {my $x = Data::Edit::Xml::Lint::new(); # New xml file linter my $catalog = $x->catalog = catalogName; # Use catalog if possible my $project = $x->project = projectName($n); # Project name my $file = $x->file = fileName($n); # Target file $x->source = <<END; # Sample source <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE concept PUBLIC "-//HPE//DTD HPE DITA Concept//EN" "concept.dtd" []> <concept id="$project"> <title>Project $project</title> <conbody> <p>Body of $project</p> </conbody> </concept> END $x->source =~ s/id="\w+?"//gs if addError($n); # Introduce an error into some projects $x->lint(foo=>1); # Write the source to the target file, lint using xmllint, include some attributes to be included as comments at the end of the target file } Data::Edit::Xml::Lint::wait; # Wait for lints to complete say STDERR Data::Edit::Xml::Lint::report($outDir, "xml")->print; # Report total pass fail rate }
Produces:
50 % success converting 3 projects containing 10 xml files on 2017-07-13 at 17:43:24 ProjectStatistics # Percent Pass Fail Total Project 1 33.3333 1 2 3 aaa 2 50.0000 2 2 4 bbb 3 66.6667 2 1 3 ccc FailingFiles # Errors Project File 1 1 ccc out/ccc5.xml 2 1 aaa out/aaa9.xml 3 1 bbb out/bbb1.xml 4 1 bbb out/bbb7.xml 5 1 aaa out/aaa3.xml
Once a file has been linted, it can reread with read to obtain details about the xml including id=?s defined (see: idDefs below) and any labels that refer to these id=?s (see: labelDefs below). Such labels provide additional names for a node which cannot be stored in the xml itself.
{catalog => "/home/phil/hp/dtd/Dtd_2016_07_12/catalog-hpe.xml", definition => "bbb", docType => "<!DOCTYPE concept PUBLIC \"-//HPE//DTD HPE DITA Concept//EN\" \"concept.dtd\" []>", errors => 1, file => "out/bbb1.xml", foo => 1, header => "<?xml version=\"1.0\" encoding=\"UTF-8\"?>", idDefs => { bbb => 1, c1 => 1 }, labelDefs => { bbb => "bbb", c1 => "c1", conbody1 => "c1", conbody2 => "c1", concept1 => "bbb", concept2 => "bbb", }, labels => "bbb concept1 concept2", project => "bbb", sha256 => "b00cdebf2e1837fa15140d25315e5558ed59eb735b5fad4bade23969babf9531", source => "..." }
In order to fix references between files, a list of files can be relinted:
the specified files are read
a map is constructed to locate all the ids and labels defined in the specified files
each file is reparsed
the resulting parse tree and id map are handed to a caller provided 𝘀𝘂𝗯 that can the traverse the parse tree fixing attributes which make references between the files.
the modified parse trees are written back to the originating file thus fixing the changes
The following sections describe the methods in each functional area of this module. For an alphabetic listing of all methods by name see Index.
Construct a new linter
Create a new xml linter - call this method statically as in Data::Edit::Xml::Lint
Attributes describing a lint
Optional author of the xml - only needed if you want to generate an SDL file map
Optional catalog file containing the locations of the DTDs used to validate the xml
Optional Dita topic type(concept|task|troubleshooting|reference) of the xml - only needed if you want to generate an SDL file map
The second line: the document type extracted from the source
Optional directory containing the DTDs used to validate the xml
Number of lint errors detected by xmllint
File that the xml will be written to and read from by lint, read or relint
File number - assigned early on by the caller to help debugging transformations
Guid for outermost tag - only required if you want to generate an SD file map
The first line: the xml header extracted from source
{id} = count - the number of times this id is defined in the xml contained in this file
{label or id} = id - the id of the node containing a label defined on the xml
Optional parse tree to supply labels for the current source as the labels are present in the parse tree not in the string representing the parse tree
Date the lint was performed by lint
Preferred representation of the xml source, used by relint to supply a preferred representation for the source
Maximum number of xmllint processes to run in parallel - 8 by default
Optional project name to allow error counts to be aggregated by project and to allow id and labels to be scoped to the files contained in each project
List of projects in which this file is reused
Sha256 hash of the string containing the xml processed by lint or read
The source Xml to be linted
Optional title of the xml - only needed if you want to generate an SDL file map
Lint xml files in parallel
Store some xml in a files, apply xmllint in parallel and update the source file with the results
Parameter Description 1 $lint Linter 2 %attributes Attributes to be recorded as xml comments
Store some xml in a files, apply xmllint in single and update the source file with the results
Store just the attributes in a file so that they can be retrieved later to process non xml objects referenced in the xml - like images
Methods for reporting the results of linting several files
Analyse the results of prior lints and return a hash reporting various statistics and a printable report
Parameter Description 1 $outputDirectory Directory to search 2 $filter Optional regular expression to filter files
Total number of passes as a percentage of all input files
Timestamp of report
Number of projects defined - each project can contain zero or more files
Number of files encountered
Array of [number of errors, project, files] ordered from least to most errors
[Projects with xmllint errors]
[Projects with no xmllint errors]
Total number of errors
Hash of "project name"=>[project name, pass, fail, total, percent pass]
A printable report of the above
Store some xml in a files, apply xmllint in parallel or single and update the source file with the results
Parameter Description 1 $inParallel In parallel or not 2 $lint Linter 3 %attributes Attributes to be recorded as xml comments
Format a fraction as a percentage to 4 decimal places
Parameter Description 1 $p Pass 2 $f Fail
1 author
2 catalog
3 ditaType
4 docType
5 dtds
6 errors
7 failingFiles
8 failingProjects
9 file
10 fileNumber
11 guid
12 header
13 idDefs
14 labelDefs
15 labels
16 lint
17 linted
18 lintNOP
19 lintOP
20 new
21 nolint
22 numberOfFiles
23 numberOfProjects
24 p4
25 passingProjects
26 passRatePercent
27 preferredSource
28 print
29 processes
30 project
31 projects
32 report
33 reusedInProject
34 sha256
35 source
36 timestamp
37 title
38 totalErrors
This module is written in 100% Pure Perl and, thus, it is easy to read, comprehend, use, modify and install via cpan:
sudo cpan install Data::Edit::Xml::Lint
philiprbrenan@gmail.com
http://www.appaapps.com
Copyright (c) 2016-2018 Philip R Brenan.
This module is free software. It may be used, redistributed and/or modified under the same terms as Perl itself.
To install Data::Edit::Xml::Lint, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Data::Edit::Xml::Lint
CPAN shell
perl -MCPAN -e shell install Data::Edit::Xml::Lint
For more information on module installation, please visit the detailed CPAN module installation guide.