The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

Name

Data::Edit::Xml::Lint - Lint xml files in parallel using xmllint and report the failure rate

Synopsis

Create some sample xml files, some with errors, lint them in parallel and retrieve the number of errors and failing files:

  for my $n(1..$N)                                                              # Some projects
   {my $x = Data::Edit::Xml::Lint::new();                                       # New xml file linter

    my $catalog = $x->catalog = catalogName;                                    # Use catalog if possible
    my $project = $x->project = projectName($n);                                # Project name
    my $file    = $x->file    =    fileName($n);                                # Target file

    $x->source = <<END;                                                         # Sample source
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE concept PUBLIC "-//HPE//DTD HPE DITA Concept//EN" "concept.dtd" []>
<concept id="$project">
 <title>Project $project</title>
 <conbody>
   <p>Body of $project</p>
 </conbody>
</concept>
END

    $x->source =~ s/id="\w+?"//gs if addError($n);                              # Introduce an error into some projects

    $x->lint(foo=>1);                                                           # Write the source to the target file, lint using xmllint, include some attributes to be included as comments at the end of the target file
   }

  Data::Edit::Xml::Lint::wait;                                                  # Wait for lints to complete

  for my $n(1..$N)                                                              # Check each linted file
   {my $x = Data::Edit::Xml::Lint::read(fileName($n));                          # Reload the linted file
    ok $x->{foo}   == 1;                                                        # Check the reloaded attributes
    ok $x->project eq projectName($n);                                          # Check project name for file
    ok $x->errors  == addError($n);                                             # Check errors in file
   }

  my $report = Data::Edit::Xml::Lint::report($outDir, "xml");                   # Report total pass fail rate
  ok $report->passRatePercent  == 50;
  ok $report->numberOfProjects ==  3;
  ok $report->numberOfFiles    == $N;
  say STDERR $report->print;                                                    # Print report
 }

Produces:

 50 % success converting 3 projects containing 10 xml files on 2017-07-13 at 17:43:24

 ProjectStatistics
    #  Percent   Pass  Fail  Total  Project
    1  33.3333      1     2      3  aaa
    2  50.0000      2     2      4  bbb
    3  66.6667      2     1      3  ccc

 FailingFiles
    #  Errors  Project       File
    1       1  ccc           out/ccc5.xml
    2       1  aaa           out/aaa9.xml
    3       1  bbb           out/bbb1.xml
    4       1  bbb           out/bbb7.xml
    5       1  aaa           out/aaa3.xml

Description

Constructor

Construct a new linter

new

Create a new xml linter - call this method statically as in Data::Edit::Xml::Lint::new()

Attributes

Attributes describing a lint

file :lvalue

File that the xml will be written to and read from

catalog :lvalue

Optional catalog file containing the locations of the DTDs used to validate the xml

dtds :lvalue

Optional directory containing the DTDs used to validate the xml

errors :lvalue

Number of lint errors detected by xmllint

linted :lvalue

Date the lint was performed

project :lvalue

Optional project name to allow error counts to be aggregated by project

processes :lvalue

Maximum number of lint processes to run in parallel - 8 by default

sha256 :lvalue

String containing the xml to be written or the xml read

source :lvalue

String containing the xml to be written or the xml read

Lint

Lint xml files in parallel

lint

Store some xml in a file and apply xmllint in parallel

     Parameter    Description
  1  $lint        Linter
  2  %attributes  Attributes to be recorded as xml comments

read

Reload a linted xml file and extract attributes

     Parameter  Description
  1  $file      File containing xml

wait()

Wait for all lints to finish

clear

Clear the results of a prior run

     Parameter         Description
  1  $outputDirectory  Directory to clear
  2  @fileExtensions   Extensions of files to remove

Report

Methods for reporting the results of linting several files

report

Analyse the results of prior lints and return a hash reporting various statistics and a printable report

     Parameter         Description
  1  $outputDirectory  Directory to clear
  2  @fileExtensions   Types of files to analyze

Attributes

passRatePercent :lvalue

Total number of passes as a percentage of all input files

timestamp :lvalue

Timestamp of report

numberOfProjects :lvalue

Number of projects defined - each project can contain zero or more files

numberOfFiles :lvalue

Number of files encountered

failingFiles :lvalue

Array of [number of errors, project, file] ordered from least to most errors

A printable report of the above

Index

catalog

clear

dtds

errors

failingFiles

file

lint

linted

new

numberOfFiles

numberOfProjects

passRatePercent

print

processes

project

read

report

sha256

source

timestamp

wait()

Installation

This module is written in 100% Pure Perl and is thus easy to read, use, modify and install.

Standard Module::Build process for building and installing modules:

  perl Build.PL
  ./Build
  ./Build test
  ./Build install

Author

philiprbrenan@gmail.com

http://www.appaapps.com

Copyright

Copyright (c) 2016-2017 Philip R Brenan.

This module is free software. It may be used, redistributed and/or modified under the same terms as Perl itself.