The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

Name

Data::Edit::Xml::Xref - Cross reference Dita XML, match topics and ameliorate missing references.

Synopsis

Xref scans an entire document corpus looking primarily for problems with references between the files in the corpus; it reports any opportunities for improvements it finds and makes changes to the corpus to implement these improvements if so requested taking advantage of parallelism where ever possible.

The following example checks the references in a corpus of Dita XML documents held in folder inputFolder:

  use Data::Edit::Xml::Xref;

  my $x = xref(inputFolder   => q(in),
               fixBadRefs    => 1,
               flattenFolder => q(out2),
               matchTopics   => 0.9,
              );

The cross reference analysis can be requested as a status line:

  ok nws($x->statusLine) eq nws(<<END);
Xref: 108 references fixed, 50 bad xrefs, 16 missing image files, 16 missing image references, 13 bad first lines, 13 bad second lines, 9 bad conrefs, 9 duplicate topic ids, 9 files with bad conrefs, 9 files with bad xrefs, 8 duplicate ids, 6 bad topicrefs, 6 files not referenced, 4 invalid guid hrefs, 2 bad book maps, 2 bad tables, 1 External xrefs with no format=html, 1 External xrefs with no scope=external, 1 file failed to parse, 1 href missing
END

Or as a tabular report:

  ok nws($x->statusTable) eq nws(<<END);
Xref:
    Count  Condition
 1    108  references fixed
 2     50  bad xrefs
 3     16  missing image files
 4     16  missing image references
 5     13  bad first lines
 6     13  bad second lines
 7      9  files with bad conrefs
 8      9  bad conrefs
 9      9  files with bad xrefs
10      9  duplicate topic ids
11      8  duplicate ids
12      6  bad topicrefs
13      6  files not referenced
14      4  invalid guid hrefs
15      2  bad book maps
16      2  bad tables
17      1  href missing
18      1  file failed to parse
19      1  External xrefs with no format=html
20      1  External xrefs with no scope=external
END

More detailed reports are produced in the reports folder:

  $x->reports

and indexed by the reports report:

  reports/reports.txt

which contains a list of all the reports generated:

    Rows  Title                                                           File
 1     5  Attributes                                                      reports/count/attributes.txt
 2    13  Bad Xml line 1                                                  reports/bad/xmlLine1.txt
 3    13  Bad Xml line 2                                                  reports/bad/xmlLine2.txt
 4     9  Bad conRefs                                                     reports/bad/ConRefs.txt
 5     2  Bad external xrefs                                              reports/bad/externalXrefs.txt
 6    16  Bad image references                                            reports/bad/imageRefs.txt
 7     9  Bad topicrefs                                                   reports/bad/bookMapRefs.txt
 8    50  Bad xRefs                                                       reports/bad/XRefs.txt
 9     2  Bookmaps with errors                                            reports/bad/bookMap.txt
10     2  Document types                                                  reports/count/docTypes.txt
11     8  Duplicate id definitions within files                           reports/bad/idDefinitionsDuplicated.txt
12     3  Duplicate topic id definitions                                  reports/bad/topicIdDefinitionsDuplicated.txt
13     3  File extensions                                                 reports/count/fileExtensions.txt
14     1  Files failed to parse                                           reports/bad/parseFailed.txt
15     0  Files types                                                     reports/count/fileTypes.txt
16    16  Files whose short names are bi-jective with their md5 sums      reports/good/shortNameToMd5Sum.txt
17     0  Files whose short names are not bi-jective with their md5 sums  reports/bad/shortNameToMd5Sum.txt
18   108  Fixes Applied To Failing References                             reports/lists/referencesFixed.txt
19     0  Good bookmaps                                                   reports/good/bookMap.txt
20     9  Good conRefs                                                    reports/good/ConRefs.txt
21     5  Good topicrefs                                                  reports/good/bookMapRefs.txt
22     8  Good xRefs                                                      reports/good/XRefs.txt
23     1  Guid topic definitions                                          reports/lists/guidsToFiles.txt
24     2  Image files                                                     reports/good/imagesFound.txt
25     1  Missing hrefs                                                   reports/bad/missingHrefAttributes.txt
26    16  Missing image references                                        reports/bad/imagesMissing.txt
27     4  Possible improvements                                           reports/improvements.txt
28     2  Resolved GUID hrefs                                             reports/good/guidHrefs.txt
29     2  Tables with errors                                              reports/bad/tables.txt
30    23  Tags                                                            reports/count/tags.txt
31    11  Topic Reuses                                                    reports/lists/topicReuse.txt
32     0  Topic Reuses                                                    reports/lists/similar/byTitle.txt
33    16  Topics                                                          reports/lists/topics.txt
34    15  Topics with similar vocabulary                                  reports/lists/similar/byVocabulary.txt
35     0  Topics with validation errors                                   reports/bad/validationErrors.txt
36     0  Topics without ids                                              reports/bad/topicIdDefinitionsMissing.txt
37     6  Unreferenced files                                              reports/bad/notReferenced.txt
38    11  Unresolved GUID hrefs                                           reports/bad/guidHrefs.txt

Add navigation titles to topic references

Xref will create or update the navigation titles navtitles of topic refs appendix|chapter|topicref in maps if requested by both file name and GUID reference:

  addNavTitle => 1

Reports of successful updates will be written to:

  reports/good/navTitles.txt

Reports of unsuccessful updates will be written to:

  reports/bad/navTitles.txt

Fix bad references

It is often desirable to ameliorate unresolved Dita href attributes so that incomplete content can be loaded into a content management system. The:

  fixBadRefs => 1

attribute requests that the:

 conref and href

attributes be renamed to:

 xtrf

if the conref or href attribute specification cannot be resolved in the current corpus by other methods of fixing failing references such as: fixDitaRefs, fixRelocatedRefs or fixXrefsByTitle.

This feature designed by mailto:mim@cpan.org.

Deguidize

Some content management systems use guids, some content management systems use file names as their means of identifying content. When moving from a guid to a file name content management system it might be necessary to replace the guids representing file names with the actual underlying file names. If the

  deguidize => 1

parameter is set to true, Xref will replace any such file guids with the underlying file name if it is present in the content being cross referenced.

File flattening

It is often desirable to flatten or reflatten the topic files in a corpus so that they can coexist in a single folder of a content management system without colliding with each other.

The presence of the input attribute:

 flattenFolder => folder-to-flatten-files-into

causes topic files to be flattened into the named folder using the GBStandard to generate the flattened file names. Xref will then update all Dita references to match these new file names. If the flattenFolder folder is the same as the inputFolder then the input files are flattened in place.

Locating relocated files

File references in conref or hrefs that have a unique valid base file name and an invalid path can be fixed by setting the input attribute:

 fixRelocatedRefs => 1

to a true value to request that Xref should replace the incorrect paths to the unique bases file names with the correct path.

If coded in conjunction with the fixBadRefs input attribute this will cause Xref to first try and fix any missing xrefs, any that still fail to resolve will then be ameliorated by moving them to the xtrf attribute.

Fix Xrefs by Title

Dita xref tags with broken or missing href attributes can sometimes be fixed by matching the text content of the xref with the titles of topics.

If:

  fixXrefsByTitle => 1

is specified, Xref will locate possible targets for a broken href by matching the white space normalized Data::Table::Text::nws of the text content of the xref with the similarly normalized title of each topic that is referenced by any book map that refers to the topic containing the xref.

If a single matching candidate is located then it will be used to update the href attribute of the xref.

Fix References in Dita To Dita Conversions

When converting a Dita input source corpus to Dita the referenced topics are usually renamed and flattened via the GBStandard. If enabled:

  fixDitaRefs => targets/

updates valid Dita references in the input corpus with the latest name for the referenced topic to make links that were valid in the input corpus valid in the output corpus as well.

The targets/ folder should contain the same set of file names as the original input corpus, each such file should contain the name of a bookmap topic present in the inputFolder= whose chapter and topicrefs identify the new names of the files cut out and flattened from the existing input corpus.

The creation of the target/ folder is usually done by some other piece of software such as Data::Edit::Xml::To::Dita as it is too complex and laborious to be performed reliably by hand. No validation of the contents of this folder is performed as it is assumed that it has been created reliably in software.

Topic Matching

Topics can be matched on title and vocabulary to assist authors in finding similar topics by specifying the:

  matchTopics => 0.9

attribute where the value of this attribute is the confidence level between 0 and 1.

Topic matching produces the reports:

  reports/lists/similar/byTitle.txt
  reports/lists/similar/byVocabulary.txt

Topic matching might take some time for large input folders.

Title matching

This report can be found at:

  reports/lists/similar/byTitle.txt

Title sorts topics by their titles so that topic with similar titles can be easily located:

    Similar  Prefix        Source
 1       14  c_Notices__   c_Notices_5614e96c7a3eaf3dfefc4a455398361b
 2           c_Notices__   c_Notices_14a9f467215dea879d417de884c21e6d
 3           c_Notices__   c_Notices_19011759a2f768d76581dc3bba170a44
 4           c_Notices__   c_Notices_aa741e6223e6cf8bc1a5ebdcf0ba867c
 5           c_Notices__   c_Notices_f0009b28c3c273094efded5fac32b83f
 6           c_Notices__   c_Notices_b1480ac1af812da3945239271c579bb1
 7           c_Notices__   c_Notices_5f3aa15d024f0b6068bd8072d4942f6d
 8           c_Notices__   c_Notices_17c1f39e8d70c765e1fbb6c495bedb03
 9           c_Notices__   c_Notices_7ea35477554f979b3045feb369b69359
10           c_Notices__   c_Notices_4f200259663703065d247b35d5500e0e
11           c_Notices__   c_Notices_e3f2eb03c23491c5e96b08424322e423
12           c_Notices__   c_Notices_06b7e9b0329740fc2b50fedfecbc5a94
13           c_Notices__   c_Notices_550a0d84dfc94982343f58f84d1c11c2
14           c_Notices__   c_Notices_fa7e563d8153668db9ed098d0fe6357b
15        3  c_Overview__  c_Overview_f9e554ee9be499368841260344815f58
16           c_Overview__  c_Overview_f234dc10ea3f4229d0e1ab4ad5e8f5fe
17           c_Overview__  c_Overview_96121d7bcd41cf8be318b96da0049e73

Vocabulary matching

This report can be found at:

  reports/lists/similar/byVocabulary.txt

Vocabulary matching compares the vocabulary of pairs of topics: topics with similar vocabularies within the confidence level specified are reported together:

    Similar  Topic
 1        8  in/1.dita
 2           in/2.dita
 3           in/3.dita
 4           in/4.dita
 5           in/5.dita
 6           in/6.dita
 7           in/7.dita
 8           in/8.dita
 9
10        2  in/map/bookmap.ditamap
11           in/map/bookmap2.ditamap
12
13        2  in/act4. dita
14           in/act5.dita

Url checking

Xref will check urls by fetching their headers with curl if the

  validateUrls=>1

is specified. A list of failing urls will be written to:

  reports/bad/urls.txt

while a corresponding list of passing urls will be written to

  reports/good/urls.txt

Description

Cross reference Dita XML, match topics and ameliorate missing references.

Version 20200202.

The following sections describe the methods in each functional area of this module. For an alphabetic listing of all methods by name see Index.

Cross reference

Check the cross references in a set of Dita files and report the results.

xref(%attributes)

Check the cross references in a set of Dita files held in inputFolder and report the results in the reports folder. The possible attributes are defined in Data::Edit::Xml::Xref.

     Parameter    Description
  1  %attributes  Cross referencer attribute value pairs

Example:

  lll "Test 011";
    clearFolder(tests, 111);
    createSampleInputFilesForFixDitaRefsImproved3(tests);

    my $y = 𝘅𝗿𝗲𝗳(inputFolder => out, reports => reportFolder);                    # Check results without fixes
    ok $y->statusLine eq q(Xref: 1 ref);

    my $x = 𝘅𝗿𝗲𝗳
     (inputFolder => out,
      reports     => reportFolder,
      fixBadRefs  => 1,
      fixDitaRefs => targets,
      fixedFolder => outFixed);

    ok !$x->errors;

Create test data

Create files to test the various capabilities provided by Xref

Data::Edit::Xml::Xref Definition

Attributes used by the Xref cross referencer.

Input fields

addNavTitles - If true, add navtitle to outgoing bookmap references to show the title of the target topic.

changeBadXrefToPh - Change xrefs being placed in M3 by fixBadRefs to ph.

classificationMaps - Create classification maps if true

deguidize - Set true to replace guids in dita references with file name. Given reference g1#g2/id convert g1 to a file name by locating the topic with topicId g2. This requires the guids to be genuinely unique. SDL guids are thought to be unique by language code but the same topic, translated to a different language might well have the same guid as the original topic with a different language code: =(de|en|es|fr). If the source is in just one language then the guid uniqueness is a reasonable assumption. If the conversion can be done in phases by language then the uniqueness of guids is again reasonably assured. Data::Edit::Xml::Lint provides an alternative solution to deguidizing by using labels to record the dita reference in the input corpus for each id encountered, these references can then be resolved in the usual manner by Data::Edit::Xml::Lint::relint.

deleteUnusedIds - Delete ids (except on topics) that are not referenced in any reference in the corpus regardless of the file component of any such reference.

fixBadRefs - Fix any remaining bad references after any all allowed attempts have been made to fix failing references by moving the failing reference to the xtrf attribute i.e. placing it in M3 possibly renaming the tag to ph if changeBadXrefToPh is in effect as well.

fixDitaRefs - Fix references in a corpus of Dita documents that have been converted to the GB Standard and whose target structure has been written to the named folder.

fixRelocatedRefs - Fix references to topics that have been moved around in the out folder structure assuming that all file names are unique which they will be if they have been renamed to the GB Standard.

fixXrefsByTitle - Try to fix invalid xrefs by the Gearhart Title Method enhanced by the Monroe map method if true

fixedFolder - Fixed files are placed in this folder.

fixedFolderTemp - Fixed files are placed in this folder if we are on aws but nit the session leader - this folder is then copied back to fixedFolder on the session leader.

flattenFolder - Files are renamed to the Gearhart standard and placed in this folder if set. References to the unflattened files are updated to references to the flattened files. This option will eventually be deprecated as the Dita::GB::Standard is now fully available allowing files to be easily flattened before being processed by Xref.

getFileUrl - A url to retrieve a specified file from the server running xref used in generating html reports. The complete url is obtained by appending the fully qualified file name to this value.

html - Generate html version of reports in this folder if supplied

indexWords - Index words to topics and topics to words if true.

indexWordsFolder - Folder into which to save words to topic and topics to word indexes if indexWords is true.

inputFolder - A folder containing the dita and ditamap files to be cross referenced.

matchTopics - Match topics by title and by vocabulary to the specified confidence level between 0 and 1. This operation might take some time to complete on a large corpus.

maxZoomIn - Optional hash of names to regular expressions to look for in each file

maximumNumberOfProcesses - Maximum number of processes to run in parallel at any one time with a sensible default.

oxygenProjects - Create oxygen project files for each map - the project file will have an extension of .xpr and the same name and path as the map file or the name return by your implementation of: Data::Edit::Xml::Xref::xprName($map) if present.

reports - Reports folder: Xref will write text versions of the generated reports to files in this folder.

requestAttributeNameAndValueCounts - Report attribute name and value counts

subjectSchemeMap - Create a subject scheme map in the named file

suppressReferenceChecks - Suppress reference checking - which normally happens by default - but which takes time and might be irrelevant if an earlier xref has already checked all the references.

validateUrls - Validate urls if true by fetching their headers with curl

Output fields

allowUniquePartialMatches - Allow unique partial matches - i.e ignore the stuff to the right of the # in a reference if doing so produces a unique result. This feature has been explicitly disabled for conrefs (PS2-561) and might need to be disabled for other types of reference as well.

attributeCount - {file}{attribute name} == count of the different xml attributes found in the xml files.

attributeNamesAndValuesCount - {file}{attribute name}{value} = count

author - {file} = author of this file.

badGuidHrefs - Bad conrefs - all.

badNavTitles - Details of nav titles that were not resolved

badReferencesCount - The number of bad references at the start of the run - however depending on what options were chosen Xref might ameliorate these bad references and thereby reduce this count.

badTables - Array of tables that need fixing.

badXml1 - [Files] with a bad xml encoding header on the first line.

badXml2 - [Files] with a bad xml doc type on the second line.

baseFiles - {base of file name}{full file name}++ Current location of the file via uniqueness guaranteed by the GB standard

baseTag - Base Tag for each file

bookMapRefs - {bookmap full file name}{href}{navTitle}++ References from bookmaps to topics via appendix, chapter, bookmapref.

conRefs - {file}{href}{tag}++ : conref source detail

createReports1 - Reports requested before references fixed

createReports2 - Reports requested after references fixed

currentFolder - The current working folder used to make absolute file names from relative ones

docType - {file} == docType: the docType for each xml file.

duplicateIds - [file, id] Duplicate id definitions within each file.

duplicateTopicIds - [topicId, [files]] Files with duplicate topic ids - the id on the outermost tag.

emptyTopics - {file} : topics where the *body is empty.

errors - Number of significant errors as reported in statusLine or 0 if no such errors found

exteriorMaps - {exterior map} : maps that are not referenced by another map

fileExtensions - Default file extensions to load

fixRefs - {file}{ref} where the href or conref target is not valid.

fixedRefsBad - [] hrefs and conrefs from fixRefs which were moved to the "xtrf" attribute as requested by the fixBadHrefs attribute because the reference was invalid and could not be improved by deguidization.

fixedRefsGB - [] files fixed to the Gearhart-Brenan file naming standard

fixedRefsGood - [] hrefs and conrefs from fixRefs which were invalid but have been fixed by deguidizing them to a valid file name.

fixedRefsNoAction - [] hrefs and conrefs from fixRefs for which no action was taken.

flattenFiles - {old full file name} = file renamed to Gearhart-Brenan file naming standard

goodImageFiles - {file}++ : number of references to each good image

goodNavTitles - Details of nav titles that were resolved.

guidHrefs - {file}{href} = location where href starts with GUID- and is thus probably a guid.

guidToFile - {topic id which is a guid} = file defining topic id.

hrefUrlEncoding - Hrefs that need url encoding because they contain white space.

idNotReferenced - {file}{id}++ - id in a file that is not referenced

idReferencedCount - {file}{id}++ - the number of times this id in this file is referenced from the rest of the corpus

idTags - {file}{id}[tag] The tags associated with each id in a file - there might be more than one if the id is duplicated

ids - {file}{id} - id definitions across all files.

idsRemoved - {id}++ : Ids removed from all files

images - {file}{href} Count of image references in each file.

imagesReferencedFromBookMaps - {bookmap full file name}{full name of image referenced from topic referenced from bookmap}++

imagesReferencedFromTopics - {topic full file name}{full name of image referenced from topic}++

imagesToRefferingBookMaps - {image full file name}{bookmap full file name}++ : images to referring bookmaps

indexedWords - {word}{full file name of topic the words occurs in}.

inputFileToTargetTopics - {input file}{target file}++ : Tells us the topics an input file was split into

inputFiles - Input files from inputFolder.

inputFolderImages - {full image file name} for all files in input folder thus including any images resent

ltgt - {text between &lt; and &gt}{filename} = count giving the count of text items found between &lt; and &gt;

maxZoomOut - Results from maxZoomIn where {file name}{regular expression key name in maxZoomIn}++

md5Sum - MD5 sum for each input file.

md5SumDuplicates - {md5sum}{file}++ : md5 sums with more than one file

missingImageFiles - [file, href] == Missing images in each file.

missingTopicIds - Missing topic ids.

noHref - Tags that should have an href but do not have one.

notReferenced - {file name} Files in input area that are not referenced by a conref, image, bookmapref or xref tag and are not a bookmap.

olBody - The number of ol under body by file

originalSourceFileAndIdToNewFile - {original file}{id} = new file: Record mapping from original source file and id to the new file containing the id

otherMeta - {original file}{othermeta name}{othermeta content}++ : the contents of the other meta tags

otherMetaBookMapsAfterTopicIncludes - Bookmap othermeta after topic othermeta has been included

otherMetaBookMapsBeforeTopicIncludes - Bookmap othermeta before topic othermeta has been included

otherMetaConsolidated - {Name}{Content}++ : consolidated other meta data across entire corpus

otherMetaDuplicatesCombined - Duplicate othermeta in bookmaps with called topics othermeta included

otherMetaDuplicatesSeparately - Duplicate othermeta in bookmaps and topics considered separately

otherMetaPushToBookMap - Othermeta that can be pushed to the calling book map

otherMetaRemainWithTopic - Othermeta that must stay in the topic

parseFailed - {file} files that failed to parse.

publicId - {file} = Public id on Doctype

references - {file}{reference}++ - the various references encountered

relocatedReferencesFailed - Failing references that were not fixed by relocation

relocatedReferencesFixed - Relocated references fixed

requiredCleanUp - {full file name}{cleanup} = number of required-cleanups

results - Summary of results table.

sourceTopicToTargetBookMap - {input topic cut into multiple pieces} = output bookmap representing pieces

statusLine - Status line summarizing the cross reference.

statusTable - Status table summarizing the cross reference.

tableDimensions - {file}{columns}{rows} == count

tagCount - {file}{tags} == count of the different tag names found in the xml files.

tags - Number of tags encountered

tagsTextsRatio - Ratio of tags to text encountered

targetFolderContent - {file} = bookmap file name : the target folder content which shows us where an input file went

targetTopicToInputFiles - {current file} = the source file from which the current file was obtained

texts - Number of texts encountered

timeEnded - Time the run ended

timeStart - Time the run started

title - {full file name} = title of file.

titleToFile - {title}{file}++ if fixXrefsByTitle is in effect

topicFlattening - {topic}{sources}++ : the source files for each topic that was flattened

topicFlatteningFactor - Topic flattening factor - higher is better

topicIds - {file} = topic id - the id on the outermost tag.

topicsFlattened - Number of topics flattened

topicsNotReferencedFromBookMaps - {topic file not referenced from any bookmap} = 1

topicsReferencedFromBookMaps - {bookmap full file name}{topic full file name}++ : bookmaps to topics

topicsToReferringBookMaps - {topic full file name}{bookmap full file name}++ : topics to referring bookmaps

urls - {topic full file name}{url}++ : urls found in each file

urlsBad - {url}{topic full file name}++ : failing urls found in each file

urlsGood - {url}{topic full file name}++ : passing urls found in each file

validationErrors - True means that Lint detected errors in the xml contained in the file.

vocabulary - The text of each topic shorn of attributes for vocabulary comparison.

xRefs - {file}{href}++ Xrefs references.

xrefBadFormat - External xrefs with no format=html.

xrefBadScope - External xrefs with no scope=external.

xrefsFixedByTitle - Xrefs fixed by locating a matching topic title from their text content.

Private Methods

newXref(%attributes)

Create a new cross referencer

     Parameter    Description
  1  %attributes  Attributes

xref2(%attributes)

Check the cross references in a set of Dita files held in inputFolder and report the results in the reports folder. The possible attributes are defined in Data::Edit::Xml::Xref

     Parameter    Description
  1  %attributes  Attributes of cross referencer

createReportsInParallel($xref, @reports)

Create reports in parallel

     Parameter  Description
  1  $xref      Cross referencer
  2  @reports   Reports to be run

createReportsInParallel1()

Create reports in parallel that do not require fixed references

createReportsInParallel2()

Create reports in parallel that require fixed references

countLevels($l, $h)

Count has elements to the specified number of levels

     Parameter  Description
  1  $l         Levels
  2  $h         Hash

externalReference($reference)

Check for an external reference

     Parameter   Description
  1  $reference  Reference to check

fixingRun($xref)

A fixing run fixes problems where it can and thus induces changes which might make the updated output different from the incoming source. Returns a useful message describing this state of affairs.

     Parameter  Description
  1  $xref      Cross referencer

loadInputFiles($xref)

Load the names of the files to be processed

     Parameter  Description
  1  $xref      Cross referencer

formatTables($xref, $data, %options)

Using cross reference $xref options and an array of arrays $data format a report as a table using %options as described in Data::Table::Text::formatTable and Data::Table::Text::formatHtmlTable.

     Parameter  Description
  1  $xref      Cross referencer
  2  $data      Table to be formatted
  3  %options   Options

hashOfCountsToArray($hash)

Convert a $hash of {key} = count to an array so it can be formatted with formatTables

     Parameter  Description
  1  $hash      Hash to be converted

reportGuidsToFiles($xref)

Map and report guids to files

     Parameter  Description
  1  $xref      Xref results

editXml($in, $out, $source)

Edit an xml file retaining any existing XML headers and lint trailers

     Parameter  Description
  1  $in        Input file
  2  $out       Output file
  3  $source    Source to write

fixReferencesInOneFile($xref, $sourceFile)

Fix one file by moving unresolved references to the xtrf attribute

     Parameter    Description
  1  $xref        Xref results
  2  $sourceFile  Source file to fix

fixReferencesParallel($xref, $file)

Fix the references in one file

     Parameter  Description
  1  $xref      Cross referencer
  2  $file      File to fix

fixReferencesResults($xref, @results)

Consolidate the results of fixing references.

     Parameter  Description
  1  $xref      Cross referencer
  2  @results   Results from fixReferencesInParallel

fixReferences($xref)

Fix just the file containing references using a number of techniques and report those references that cannot be so fixed.

     Parameter  Description
  1  $xref      Xref results

fixOneFileGB($xref, $file)

Fix one file to the Gearhart-Brenan standard

     Parameter  Description
  1  $xref      Xref results
  2  $file      File to fix

fixFilesGB($xref)

Rename files to the GB Standard

     Parameter  Description
  1  $xref      Xref results

analyzeOneFileParallel($Xref, $iFile)

Analyze one input file

     Parameter  Description
  1  $Xref      Xref request
  2  $iFile     File to analyze

analyzeOneFileResults($xref, @x)

Merge a list of cross reference results into the first cross referencer in the list

     Parameter  Description
  1  $xref      Cross referencer to merge into
  2  @x         Other cross referencers

analyzeInputFiles($xref)

Analyze the input files

     Parameter  Description
  1  $xref      Cross referencer

reportIdRefs($xref)

Report the number of times each id is referenced

     Parameter  Description
  1  $xref      Cross referencer

removeUnusedIds($xref)

Remove ids that do are not mentioned in any href or conref in the corpus regardless of the file component of any such reference. This is a very conservative approach which acknowledges that writers might be looking for an id if they mention it in a reference.

     Parameter  Description
  1  $xref      Cross referencer

reportEmptyTopics($xref)

Report empty topics

     Parameter  Description
  1  $xref      Cross referencer

reportDuplicateIds($xref)

Report duplicate ids

     Parameter  Description
  1  $xref      Cross referencer

reportDuplicateTopicIds($xref)

Report duplicate topic ids

     Parameter  Description
  1  $xref      Cross referencer

reportNoHrefs($xref)

Report locations where an href was expected but not found

     Parameter  Description
  1  $xref      Cross referencer

checkReferences($xref)

Check each reference, report bad references and mark them for fixing.

     Parameter  Description
  1  $xref      Cross referencer

reportGuidHrefs($xref)

Report on guid hrefs

     Parameter  Description
  1  $xref      Cross referencer

reportImages($xref)

Reports on images and references to images

     Parameter  Description
  1  $xref      Cross referencer

reportParseFailed($xref)

Report failed parses

     Parameter  Description
  1  $xref      Cross referencer

reportXml1($xref)

Report bad xml on line 1

     Parameter  Description
  1  $xref      Cross referencer

reportXml2($xref)

Report bad xml on line 2

     Parameter  Description
  1  $xref      Cross referencer

reportDocTypeCount($xref)

Report doc type count

     Parameter  Description
  1  $xref      Cross referencer

reportTagCount($xref)

Report tag counts

     Parameter  Description
  1  $xref      Cross referencer

reportTagsAndTextsCount($xref)

Report tags and texts counts

     Parameter  Description
  1  $xref      Cross referencer

reportLtGt($xref)

Report items found between &lt; and &gt;

     Parameter  Description
  1  $xref      Cross referencer

reportAttributeCount($xref)

Report attribute counts

     Parameter  Description
  1  $xref      Cross referencer

reportAttributeNameAndValueCounts($xref)

Report attribute value counts

     Parameter  Description
  1  $xref      Cross referencer

reportValidationErrors($xref)

Report the files known to have validation errors

     Parameter  Description
  1  $xref      Cross referencer

reportTables($xref)

Report on tables that have problems

     Parameter  Description
  1  $xref      Cross referencer

reportFileExtensionCount($xref)

Report file extension counts

     Parameter  Description
  1  $xref      Cross referencer

reportFileTypes($xref)

Report file type counts - takes too long in series

     Parameter  Description
  1  $xref      Cross referencer

reportExternalXrefs($xref)

Report external xrefs missing other attributes

     Parameter  Description
  1  $xref      Cross referencer

reportMaxZoomOut($xref)

Text located via Max Zoom In

     Parameter  Description
  1  $xref      Cross referencer

reportTopicDetails($xref)

Things that occur once in each file

     Parameter  Description
  1  $xref      Cross referencer

reportTopicReuse($xref)

Count how frequently each topic is reused

     Parameter  Description
  1  $xref      Cross referencer

reportFixRefs($xref)

Report of hrefs that need to be fixed

     Parameter  Description
  1  $xref      Cross referencer

reportSourceFiles($xref)

Source file for each topic

     Parameter  Description
  1  $xref      Cross referencer

reportReferencesFromBookMaps($xref)

Topics and images referenced from bookmaps

     Parameter  Description
  1  $xref      Cross referencer

reportExteriorMaps($xref)

Maps that are not referenced by any other map

     Parameter  Description
  1  $xref      Cross referencer

reportTopicsNotReferencedFromBookMaps($xref)

Topics not referenced from bookmaps

     Parameter  Description
  1  $xref      Cross referencer

reportTableDimensions($xref)

Report table dimensions

     Parameter  Description
  1  $xref      Cross referencer

reportOtherMeta($xref)

Advise in the feasibility of moving othermeta data from topics to bookmaps assuming that the othermeta data will be applied only at the head of the map rather than individually to each topic in the map.

     Parameter  Description
  1  $xref      Cross referencer

createSubjectSchemeMap($xref)

Create a subject scheme map from othermeta

     Parameter  Description
  1  $xref      Cross referencer

writeClassificationHtml($xref, $classification)

Write classification tree as html

     Parameter        Description
  1  $xref            Cross referencer
  2  $classification  {title=>{subject=>{file=>++}}}

createClassificationMap($xref, $bookMap, $classification)

Create a classification map for each bookmap

     Parameter        Description
  1  $xref            Cross referencer
  2  $bookMap         Bookmap to classify
  3  $classification  Classification scheme

createClassificationMaps($xref)

Create classification maps for each bookmap

     Parameter  Description
  1  $xref      Cross referencer

reportSimilarTopicsByTitle($xref)

Report topics likely to be similar on the basis of their titles as expressed in the non Guid part of their file names

     Parameter  Description
  1  $xref      Cross referencer

reportSimilarTopicsByVocabulary($xref)

Report topics likely to be similar on the basis of their vocabulary

     Parameter  Description
  1  $xref      Cross referencer

reportWordsByFile($xref)

Index words to the files they occur in

     Parameter  Description
  1  $xref      Cross referencer

reportMd5Sum($xref)

Report files with identical md5 sums

     Parameter  Description
  1  $xref      Cross referencer

reportOlBody($xref)

ol under body - indicative of a task

     Parameter  Description
  1  $xref      Cross referencer

reportHrefUrlEncoding($xref)

href needs url encoding

     Parameter  Description
  1  $xref      Cross referencer

reportConRefMatching($xref)

Report conref matching

     Parameter  Description
  1  $xref      Cross referencer

reportPublicIds($xref)

Report public ids in use

     Parameter  Description
  1  $xref      Cross referencer

reportRequiredCleanUps($xref)

Report required clean ups

     Parameter  Description
  1  $xref      Cross referencer

reportUrls($xref)

Report urls that fail to resolve

     Parameter  Description
  1  $xref      Cross referencer

addNavTitlesToOneMap($xref, $file)

Fix navtitles in one map

     Parameter  Description
  1  $xref      Xref results
  2  $file      File to fix

addNavTitlesToMaps($xref)

Add nav titles to files containing maps.

     Parameter  Description
  1  $xref      Xref results

oxygenProjectFileMetaData()

Meta data for the oxygen project files

createOxygenProjectFile($xref, $bm, $xprName)

Create an Oxygen project file for the specified bookmap

     Parameter  Description
  1  $xref      Xref
  2  $bm        Bookmap
  3  $xprName   Xpr name from bookmap

createOxygenProjectMapFiles($xref)

Create Oxygen project files from Xref results

     Parameter  Description
  1  $xref      Cross referencer

oneBadRef($xref, $file, $href)

Check one reference and return the first error encountered or undef if no errors encountered. Relies on topicIds to test files present and test the topicId is valid, relies on ids to check that the referenced id is valid.

     Parameter  Description
  1  $xref      Cross referencer
  2  $file      File containing reference
  3  $href      Reference

createSampleInputFilesBaseCase($in, $N)

Create sample input files for testing. The attribute inputFolder supplies the name of the folder in which to create the sample files.

     Parameter  Description
  1  $in        Input folder
  2  $N         Number of sample files

createSampleInputFilesFixFolder($in)

Create sample input files for testing fixFolder

     Parameter  Description
  1  $in        Folder to create the files in

createSampleInputFilesLtGt($in)

Create sample input files for testing items between &lt; and &gt;

     Parameter  Description
  1  $in        Folder to create the files in

createSampleInputFilesForFixDitaRefs($in, $targets)

Create sample input files for fixing renamed topic refs

     Parameter  Description
  1  $in        Folder to create the files in
  2  $targets   Targets folder

createSampleInputFilesForFixDitaRefsXref($in)

Create sample input files for fixing references into renamed topics by xref

     Parameter  Description
  1  $in        Folder to create the files in

createSampleConRefs($in)

Create sample input files for fixing a conref

     Parameter  Description
  1  $in        Folder to create the files in

createSampleConRefMatching($in)

Create sample input files for matching conref source and targets

     Parameter  Description
  1  $in        Folder to create the files in

createSampleDuplicateMd5Sum($in)

Create sample input files with duplicate md5 sums

     Parameter  Description
  1  $in        Folder to create the files in

createSampleUnreferencedIds($in)

Create sample input files with unreferenced ids

     Parameter  Description
  1  $in        Folder to create the files in

createEmptyBody($in)

Create sample input files for empty body detection

     Parameter  Description
  1  $in        Folder to create the files in

createClassificationMapsTest($in)

Create sample input files for a classification map

     Parameter  Description
  1  $in        Folder to create the files in

createWordsToFilesTest($in)

Index words to file

     Parameter  Description
  1  $in        Folder to create the files in

createUrlTests($in)

Check urls

     Parameter  Description
  1  $in        Folder to create the files in

changeFolderAndWriteFiles($f, $D)

Change file structure to the current folder and write

     Parameter  Description
  1  $f         Data structure as a string
  2  $D         Target folder

createSampleInputFilesForFixDitaRefsImproved1($folder)

Create sample input files for fixing references via the targets/ folder

     Parameter  Description
  1  $folder    Folder to switch to

createSampleInputFilesForFixDitaRefsImproved2($folder)

Create sample input files for fixing conref references via the targets/ folder

     Parameter  Description
  1  $folder    Folder to switch to

createSampleInputFilesForFixDitaRefsImproved3($folder)

Create sample input files for fixing bookmap references to topics that get cut into multiple pieces

     Parameter  Description
  1  $folder    Folder to switch to

createSampleInputFilesForFixDitaRefsImproved4($folder)

Create sample input files for fixing bookmap reference to a topic that did not get cut into multiple pieces

     Parameter  Description
  1  $folder    Folder to switch to

createSampleImageTest($folder)

Create sample input files for fixing bookmap reference to a topic that did not get cut into multiple pieces

     Parameter  Description
  1  $folder    Folder to switch to

createTestTopicFlattening($folder)

Create sample input files for testing topic flattening ratio reporting

     Parameter  Description
  1  $folder    Folder to switch to

createTestReferencedToFlattenedTopic($folder)

Full reference to a topic that has been flattened

     Parameter  Description
  1  $folder    Folder to switch to

createTestReferenceToCutOutTopic($folder)

References from a topic that has been cut out to a topic that has been cut out

     Parameter  Description
  1  $folder    Folder to switch to

createSampleOtherMeta($out)

Create sample data for othermeta reports

     Parameter  Description
  1  $out       Folder

createTestOneNotRef($folder)

One topic refernced and the other not

     Parameter  Description
  1  $folder    Folder to switch to

createSampleTopicsReferencedFromBookMaps($in)

The number of times a topic is referenced from a bookmap

     Parameter  Description
  1  $in        Folder to create the files in

createSampleImageReferences($in)

Good and bad image references

     Parameter  Description
  1  $in        Folder to create the files in

createRequiredCleanUps($in)

Required clean ups report

     Parameter  Description
  1  $in        Folder to create the files in

createSoftConrefs($in)

Fix file part of conref even if the rest is invalid

     Parameter  Description
  1  $in        Folder to create the files in

checkXrefStructure($x, $field, @folders)

Check an output structure produced by Xrf

     Parameter  Description
  1  $x         Cross references
  2  $field     Field to check
  3  @folders   Folders to suppress

writeXrefStructure($x, $field, @folders)

Write the test for an Xref structure

     Parameter  Description
  1  $x         Cross referencer
  2  $field     Field
  3  @folders   Names of the folders to suppress

deleteVariableFields($x)

Remove time and other fields that do not affect the end results

     Parameter  Description
  1  $x         Cross referencer

testReferenceChecking()

Test reference checking

Index

1 addNavTitlesToMaps - Add nav titles to files containing maps.

2 addNavTitlesToOneMap - Fix navtitles in one map

3 analyzeInputFiles - Analyze the input files

4 analyzeOneFileParallel - Analyze one input file

5 analyzeOneFileResults - Merge a list of cross reference results into the first cross referencer in the list

6 changeFolderAndWriteFiles - Change file structure to the current folder and write

7 checkReferences - Check each reference, report bad references and mark them for fixing.

8 checkXrefStructure - Check an output structure produced by Xrf

9 countLevels - Count has elements to the specified number of levels

10 createClassificationMap - Create a classification map for each bookmap

11 createClassificationMaps - Create classification maps for each bookmap

12 createClassificationMapsTest - Create sample input files for a classification map

13 createEmptyBody - Create sample input files for empty body detection

14 createOxygenProjectFile - Create an Oxygen project file for the specified bookmap

15 createOxygenProjectMapFiles - Create Oxygen project files from Xref results

16 createReportsInParallel - Create reports in parallel

17 createReportsInParallel1 - Create reports in parallel that do not require fixed references

18 createReportsInParallel2 - Create reports in parallel that require fixed references

19 createRequiredCleanUps - Required clean ups report

20 createSampleConRefMatching - Create sample input files for matching conref source and targets

21 createSampleConRefs - Create sample input files for fixing a conref

22 createSampleDuplicateMd5Sum - Create sample input files with duplicate md5 sums

23 createSampleImageReferences - Good and bad image references

24 createSampleImageTest - Create sample input files for fixing bookmap reference to a topic that did not get cut into multiple pieces

25 createSampleInputFilesBaseCase - Create sample input files for testing.

26 createSampleInputFilesFixFolder - Create sample input files for testing fixFolder

27 createSampleInputFilesForFixDitaRefs - Create sample input files for fixing renamed topic refs

28 createSampleInputFilesForFixDitaRefsImproved1 - Create sample input files for fixing references via the targets/ folder

29 createSampleInputFilesForFixDitaRefsImproved2 - Create sample input files for fixing conref references via the targets/ folder

30 createSampleInputFilesForFixDitaRefsImproved3 - Create sample input files for fixing bookmap references to topics that get cut into multiple pieces

31 createSampleInputFilesForFixDitaRefsImproved4 - Create sample input files for fixing bookmap reference to a topic that did not get cut into multiple pieces

32 createSampleInputFilesForFixDitaRefsXref - Create sample input files for fixing references into renamed topics by xref

33 createSampleInputFilesLtGt - Create sample input files for testing items between &lt; and &gt;

34 createSampleOtherMeta - Create sample data for othermeta reports

35 createSampleTopicsReferencedFromBookMaps - The number of times a topic is referenced from a bookmap

36 createSampleUnreferencedIds - Create sample input files with unreferenced ids

37 createSoftConrefs - Fix file part of conref even if the rest is invalid

38 createSubjectSchemeMap - Create a subject scheme map from othermeta

39 createTestOneNotRef - One topic refernced and the other not

40 createTestReferencedToFlattenedTopic - Full reference to a topic that has been flattened

41 createTestReferenceToCutOutTopic - References from a topic that has been cut out to a topic that has been cut out

42 createTestTopicFlattening - Create sample input files for testing topic flattening ratio reporting

43 createUrlTests - Check urls

44 createWordsToFilesTest - Index words to file

45 deleteVariableFields - Remove time and other fields that do not affect the end results

46 editXml - Edit an xml file retaining any existing XML headers and lint trailers

47 externalReference - Check for an external reference

48 fixFilesGB - Rename files to the GB Standard

49 fixingRun - A fixing run fixes problems where it can and thus induces changes which might make the updated output different from the incoming source.

50 fixOneFileGB - Fix one file to the Gearhart-Brenan standard

51 fixReferences - Fix just the file containing references using a number of techniques and report those references that cannot be so fixed.

52 fixReferencesInOneFile - Fix one file by moving unresolved references to the xtrf attribute

53 fixReferencesParallel - Fix the references in one file

54 fixReferencesResults - Consolidate the results of fixing references.

55 formatTables - Using cross reference $xref options and an array of arrays $data format a report as a table using %options as described in Data::Table::Text::formatTable and Data::Table::Text::formatHtmlTable.

56 hashOfCountsToArray - Convert a $hash of {key} = count to an array so it can be formatted with formatTables

57 loadInputFiles - Load the names of the files to be processed

58 newXref - Create a new cross referencer

59 oneBadRef - Check one reference and return the first error encountered or undef if no errors encountered.

60 oxygenProjectFileMetaData - Meta data for the oxygen project files

61 removeUnusedIds - Remove ids that do are not mentioned in any href or conref in the corpus regardless of the file component of any such reference.

62 reportAttributeCount - Report attribute counts

63 reportAttributeNameAndValueCounts - Report attribute value counts

64 reportConRefMatching - Report conref matching

65 reportDocTypeCount - Report doc type count

66 reportDuplicateIds - Report duplicate ids

67 reportDuplicateTopicIds - Report duplicate topic ids

68 reportEmptyTopics - Report empty topics

69 reportExteriorMaps - Maps that are not referenced by any other map

70 reportExternalXrefs - Report external xrefs missing other attributes

71 reportFileExtensionCount - Report file extension counts

72 reportFileTypes - Report file type counts - takes too long in series

73 reportFixRefs - Report of hrefs that need to be fixed

74 reportGuidHrefs - Report on guid hrefs

75 reportGuidsToFiles - Map and report guids to files

76 reportHrefUrlEncoding - href needs url encoding

77 reportIdRefs - Report the number of times each id is referenced

78 reportImages - Reports on images and references to images

79 reportLtGt - Report items found between &lt; and &gt;

80 reportMaxZoomOut - Text located via Max Zoom In

81 reportMd5Sum - Report files with identical md5 sums

82 reportNoHrefs - Report locations where an href was expected but not found

83 reportOlBody - ol under body - indicative of a task

84 reportOtherMeta - Advise in the feasibility of moving othermeta data from topics to bookmaps assuming that the othermeta data will be applied only at the head of the map rather than individually to each topic in the map.

85 reportParseFailed - Report failed parses

86 reportPublicIds - Report public ids in use

87 reportReferencesFromBookMaps - Topics and images referenced from bookmaps

88 reportRequiredCleanUps - Report required clean ups

89 reportSimilarTopicsByTitle - Report topics likely to be similar on the basis of their titles as expressed in the non Guid part of their file names

90 reportSimilarTopicsByVocabulary - Report topics likely to be similar on the basis of their vocabulary

91 reportSourceFiles - Source file for each topic

92 reportTableDimensions - Report table dimensions

93 reportTables - Report on tables that have problems

94 reportTagCount - Report tag counts

95 reportTagsAndTextsCount - Report tags and texts counts

96 reportTopicDetails - Things that occur once in each file

97 reportTopicReuse - Count how frequently each topic is reused

98 reportTopicsNotReferencedFromBookMaps - Topics not referenced from bookmaps

99 reportUrls - Report urls that fail to resolve

100 reportValidationErrors - Report the files known to have validation errors

101 reportWordsByFile - Index words to the files they occur in

102 reportXml1 - Report bad xml on line 1

103 reportXml2 - Report bad xml on line 2

104 testReferenceChecking - Test reference checking

105 writeClassificationHtml - Write classification tree as html

106 writeXrefStructure - Write the test for an Xref structure

107 xref - Check the cross references in a set of Dita files held in inputFolder and report the results in the reports folder.

108 xref2 - Check the cross references in a set of Dita files held in inputFolder and report the results in the reports folder.

Installation

This module is written in 100% Pure Perl and, thus, it is easy to read, comprehend, use, modify and install via cpan:

  sudo cpan install Data::Edit::Xml::Xref

Author

philiprbrenan@gmail.com

http://www.appaapps.com

Copyright

Copyright (c) 2016-2019 Philip R Brenan.

This module is free software. It may be used, redistributed and/or modified under the same terms as Perl itself.