formatTable($xref->fixedRefsFailed = \@bad, <<END, # Report invalid references Reason The reason the reference was not fixed Tag The tag of the node in which the reference failure occurs Attr The attribute of the node in which the reference failure occurs Href The reference not being fixed File The file in which the reference appears Source_Files One or more files that contained the content in this file END summarize=>1, title=>$xref->fixBadRefs ? qq(These failing references refer to files that could not be located and so were put in M3) : qq(These failing references refer to files that could not be located), head=><<END, Xref moved NNNN failing references on DDDD END file=>(fpe($xref->reports, qw(bad failingReferences txt))));
There are 138 maps that have othermeta in them. For at least one of them this assumption is not true, see: help/bundle_setup/setup.ditamap
Data::Edit::Xml::Xref - Cross reference Dita XML, match topics and ameliorate missing references.
Check the references in a large corpus of Dita XML documents held in folder inputFolder running processes in parallel where ever possible to take advantage of multi-cpu computers:
use Data::Edit::Xml::Xref; my $x = xref(inputFolder => q(in), maximumNumberOfProcesses => 512, fixBadRefs => 1, flattenFolder => q(out2), matchTopics => 0.9, );
The cross reference analysis can be requested as a status line:
ok nws($x->statusLine) eq nws(<<END); Xref: 108 references fixed, 50 bad xrefs, 16 missing image files, 16 missing image references, 13 bad first lines, 13 bad second lines, 9 bad conrefs, 9 duplicate topic ids, 9 files with bad conrefs, 9 files with bad xrefs, 8 duplicate ids, 6 bad topicrefs, 6 files not referenced, 4 invalid guid hrefs, 2 bad book maps, 2 bad tables, 1 External xrefs with no format=html, 1 External xrefs with no scope=external, 1 file failed to parse, 1 href missing END
Or as a tabular report:
ok nws($x->statusTable) eq nws(<<END); Xref: Count Condition 1 108 references fixed 2 50 bad xrefs 3 16 missing image files 4 16 missing image references 5 13 bad first lines 6 13 bad second lines 7 9 files with bad conrefs 8 9 bad conrefs 9 9 files with bad xrefs 10 9 duplicate topic ids 11 8 duplicate ids 12 6 bad topicrefs 13 6 files not referenced 14 4 invalid guid hrefs 15 2 bad book maps 16 2 bad tables 17 1 href missing 18 1 file failed to parse 19 1 External xrefs with no format=html 20 1 External xrefs with no scope=external END
More detailed reports are produced in the reports folder:
$x->reports
and indexed by the reports report:
reports/reports.txt
which contains a list of all the reports generated:
Rows Title File 1 5 Attributes reports/count/attributes.txt 2 13 Bad Xml line 1 reports/bad/xmlLine1.txt 3 13 Bad Xml line 2 reports/bad/xmlLine2.txt 4 9 Bad conRefs reports/bad/ConRefs.txt 5 2 Bad external xrefs reports/bad/externalXrefs.txt 6 16 Bad image references reports/bad/imageRefs.txt 7 9 Bad topicrefs reports/bad/bookMapRefs.txt 8 50 Bad xRefs reports/bad/XRefs.txt 9 2 Bookmaps with errors reports/bad/bookMap.txt 10 2 Document types reports/count/docTypes.txt 11 8 Duplicate id definitions within files reports/bad/idDefinitionsDuplicated.txt 12 3 Duplicate topic id definitions reports/bad/topicIdDefinitionsDuplicated.txt 13 3 File extensions reports/count/fileExtensions.txt 14 1 Files failed to parse reports/bad/parseFailed.txt 15 0 Files types reports/count/fileTypes.txt 16 16 Files whose short names are bi-jective with their md5 sums reports/good/shortNameToMd5Sum.txt 17 0 Files whose short names are not bi-jective with their md5 sums reports/bad/shortNameToMd5Sum.txt 18 108 Fixes Applied To Failing References reports/lists/referencesFixed.txt 19 0 Good bookmaps reports/good/bookMap.txt 20 9 Good conRefs reports/good/ConRefs.txt 21 5 Good topicrefs reports/good/bookMapRefs.txt 22 8 Good xRefs reports/good/XRefs.txt 23 1 Guid topic definitions reports/lists/guidsToFiles.txt 24 2 Image files reports/good/imagesFound.txt 25 1 Missing hrefs reports/bad/missingHrefAttributes.txt 26 16 Missing image references reports/bad/imagesMissing.txt 27 4 Possible improvements reports/improvements.txt 28 2 Resolved GUID hrefs reports/good/guidHrefs.txt 29 2 Tables with errors reports/bad/tables.txt 30 23 Tags reports/count/tags.txt 31 11 Topic Reuses reports/lists/topicReuse.txt 32 0 Topic Reuses reports/lists/similar/byTitle.txt 33 16 Topics reports/lists/topics.txt 34 15 Topics with similar vocabulary reports/lists/similar/byVocabulary.txt 35 0 Topics with validation errors reports/bad/validationErrors.txt 36 0 Topics without ids reports/bad/topicIdDefinitionsMissing.txt 37 6 Unreferenced files reports/bad/notReferenced.txt 38 11 Unresolved GUID hrefs reports/bad/guidHrefs.txt
Xref will create or update the navigation titles navtitles of topic refs appendix|chapter|topicref in maps if requested by both file name and GUID reference:
addNavTitle => 1
Reports of successful updates will be written to:
reports/good/navTitles.txt
Reports of unsuccessful updates will be written to:
reports/bad/navTitles.txt
It is often desirable to ameliorate unresolved Dita href attributes so that incomplete content can be loaded into a content management system. The:
fixBadRefs => 1
attribute requests that the:
conref and href
attributes be renamed to:
xtrf
if the conref or href attribute specification cannot be resolved in the current corpus.
If the fixedFolder attribute is set, the fixed files are written into this folder, else they are written back into the inputFolder. Two reports are generated by this action:
reports/bad/fixedRefs.txt reports/bad/fixedRefsNoAction.txt
This feature designed by mailto:mim@cpan.org.
Some content management systems use guids, some content management systems use file names as their means of identifying content. When moving from a guid to a file name content management system it might be necessary to replace the guids representing file names with the actual underlying file names. If the
deguidize => 1
parameter is set to true, Xref will replace any such file guids with the underlying file name if it is present in the content being cross referenced.
It is often desirable to flatten or reflatten the topic files in a corpus so that they can coexist in a single folder of a content management system without colliding with each other.
The presence of the input attribute:
flattenFolder => folder-to-flatten-files-into
causes topic files to be flattened into the named folder using the GBStandard to generate the flattened file names. Xref will then update all Dita references to match these new file names. If the flattenFolder folder is the same as the inputFolder then the input files are flattened in place.
File references in conref or hrefs that have a unique valid base file name and an invalid path can be fixed by setting the input attribute:
fixRelocatedRefs => 1
to a true value to request that Xref should replace the incorrect paths to the unique bases file names with the correct path.
If coded in conjunction with the fixBadRefs input attribute this will cause Xref to first try and fix any missing xrefs, any that still fail to resolve will then be ameliorated by moving them to the xtrf attribute.
Dita xref tags with broken or missing href attributes can sometimes be fixed by matching the text content of the xref with the titles of topics.
If:
fixXrefsByTitle => 1
is specified, Xref will locate possible targets for a broken href by matching the white space normalized Data::Table::Text::nws of the text content of the xref with the similarly normalized title of each topic. If a single matching candidate is located then it will be used to update the href attribute of the xref.
When converting a Dita input source corpus to Dita the referenced topics are usually renamed and flattened via the GBStandard. If enabled:
fixDitaRefs => targets/
updates valid Dita references in the input corpus with the latest name for the referenced topic to make links that were valid in the input corpus valid in the output corpus as well.
The targets/ folder should contain the same set of file names as the original input corpus, each such file should contain the name of a bookmap topic present in the inputFolder= whose chapter and topicrefs identify the new names of the files cut out and flattened from the existing input corpus.
The creation of the target/ folder is usually done by some other piece of software such as Data::Edit::Xml::To::Dita as it is too complex and laborious to be performed reliably by hand. No validation of the contents of this folder is performed as it is assumed that it has been created reliably in software.
Topics can be matched on title and vocabulary to assist authors in finding similar topics by specifying the:
matchTopics => 0.9
attribute where the value of this attribute is the confidence level between 0 and 1.
Topic matching produces the reports:
reports/lists/similar/byTitle.txt reports/lists/similar/byVocabulary.txt
Topic matching might take some time for large input folders.
This report can be found at:
reports/lists/similar/byTitle.txt
Title sorts topics by their titles so that topic with similar titles can be easily located:
Similar Prefix Source 1 14 c_Notices__ c_Notices_5614e96c7a3eaf3dfefc4a455398361b 2 c_Notices__ c_Notices_14a9f467215dea879d417de884c21e6d 3 c_Notices__ c_Notices_19011759a2f768d76581dc3bba170a44 4 c_Notices__ c_Notices_aa741e6223e6cf8bc1a5ebdcf0ba867c 5 c_Notices__ c_Notices_f0009b28c3c273094efded5fac32b83f 6 c_Notices__ c_Notices_b1480ac1af812da3945239271c579bb1 7 c_Notices__ c_Notices_5f3aa15d024f0b6068bd8072d4942f6d 8 c_Notices__ c_Notices_17c1f39e8d70c765e1fbb6c495bedb03 9 c_Notices__ c_Notices_7ea35477554f979b3045feb369b69359 10 c_Notices__ c_Notices_4f200259663703065d247b35d5500e0e 11 c_Notices__ c_Notices_e3f2eb03c23491c5e96b08424322e423 12 c_Notices__ c_Notices_06b7e9b0329740fc2b50fedfecbc5a94 13 c_Notices__ c_Notices_550a0d84dfc94982343f58f84d1c11c2 14 c_Notices__ c_Notices_fa7e563d8153668db9ed098d0fe6357b 15 3 c_Overview__ c_Overview_f9e554ee9be499368841260344815f58 16 c_Overview__ c_Overview_f234dc10ea3f4229d0e1ab4ad5e8f5fe 17 c_Overview__ c_Overview_96121d7bcd41cf8be318b96da0049e73
reports/lists/similar/byVocabulary.txt
Vocabulary matching compares the vocabulary of pairs of topics: topics with similar vocabularies within the confidence level specified are reported together:
Similar Topic 1 8 in/1.dita 2 in/2.dita 3 in/3.dita 4 in/4.dita 5 in/5.dita 6 in/6.dita 7 in/7.dita 8 in/8.dita 9 10 2 in/map/bookmap.ditamap 11 in/map/bookmap2.ditamap 12 13 2 in/act4. dita 14 in/act5.dita
Cross reference Dita XML, match topics and ameliorate missing references.
Version 20190712.
The following sections describe the methods in each functional area of this module. For an alphabetic listing of all methods by name see Index.
Check the cross references in a set of Dita files and report the results.
Check the cross references in a set of Dita files held in inputFolder and report the results in the reports folder. The possible attributes are defined in Data::Edit::Xml::Xref
Parameter Description 1 {my $xref = newXref(@_); Create the cross referencer
Example:
if (1) References from a topic that has been cut out to a topic that has been cut out {clearFolder(tests, 111); createTestReferenceToCutOutTopic(tests); my $x = 𝘅𝗿𝗲𝗳(inputFolder => out, fixDitaRefs => targets); ok $x->statusLine eq q(Xref: 1 ref); is_deeply checkXrefStructure($x, q(inputFileToTargetTopics), in, targets), { "a.xml" => { "c_aaaa_121939eab89cd7d2c3eb4c4189772a1f.dita" => 1, "c_aaaa_bbbb_55baefe9258538b26a95b0015a8d5a2b.dita" => 1, "c_aaaa_cccc_a91633094220d068c453eecae1726eff.dita" => 1, "c_aaaa_dddd_914b8e11993908497768c50d992ea0f0.dita" => 1, }, "b.xml" => { "c_bbbb_6100b51ca1f789836cd4f31893ed67d2.dita" => 1, "c_bbbb_aaaa_cfd3a140e06a914fc8469583ad87829d.dita" => 1, "c_bbbb_bbbb_c90ebf976073b2a3f7a8dc27a3c8254b.dita" => 1, "c_bbbb_cccc_d1c80714275637cde524bdfa1304a8f3.dita" => 1, }, }; is_deeply checkXrefStructure($x, q(targetTopicToInputFiles), in, targets), { "c_aaaa_121939eab89cd7d2c3eb4c4189772a1f.dita" => { "a.xml" => 1, }, "c_aaaa_bbbb_55baefe9258538b26a95b0015a8d5a2b.dita" => { "a.xml" => 1, }, "c_aaaa_cccc_a91633094220d068c453eecae1726eff.dita" => { "a.xml" => 1, }, "c_aaaa_dddd_914b8e11993908497768c50d992ea0f0.dita" => { "a.xml" => 1, }, "c_bbbb_6100b51ca1f789836cd4f31893ed67d2.dita" => { "b.xml" => 1, }, "c_bbbb_aaaa_cfd3a140e06a914fc8469583ad87829d.dita" => { "b.xml" => 1, }, "c_bbbb_bbbb_c90ebf976073b2a3f7a8dc27a3c8254b.dita" => { "b.xml" => 1, }, "c_bbbb_cccc_d1c80714275637cde524bdfa1304a8f3.dita" => { "b.xml" => 1, }, }; is_deeply checkXrefStructure($x, q(sourceTopicToTargetBookMap), in, targets), { "a.xml" => bless({ source => "a.xml", sourceDocType => "concept", target => "bm_a_9d0a9f8e0ac234de9e22c19054b6e455.ditamap", targetType => "bookmap", }, "Bookmap"), "b.xml" => bless({ source => "b.xml", sourceDocType => "concept", target => "bm_b_d2806ba589f908da1106574afd9db642.ditamap", targetType => "bookmap", }, "Bookmap"), }; is_deeply checkXrefStructure($x, q(topicFlattening), in, targets), {}; is_deeply checkXrefStructure($x, q(originalSourceFileAndIdToNewFile), in, targets), { "a.xml" => { "GUID-400c2c59-95e1-7bf3-4647-3a135281bfaf" => "c_aaaa_cccc_a91633094220d068c453eecae1726eff.dita", "GUID-68822563-d568-f418-38ae-f1c62cb4ac8d" => "c_aaaa_dddd_914b8e11993908497768c50d992ea0f0.dita", "GUID-c67821ef-3da2-c89f-0fc9-9fba3937f368" => "c_aaaa_121939eab89cd7d2c3eb4c4189772a1f.dita", "GUID-f0c0e170-8128-10ef-045d-97602fdde76f" => "c_aaaa_bbbb_55baefe9258538b26a95b0015a8d5a2b.dita", }, "b.xml" => { "GUID-2b6aab4f-9328-e326-f55f-160771a8c3dd" => "c_bbbb_cccc_d1c80714275637cde524bdfa1304a8f3.dita", "GUID-86a684b0-1a0b-4c30-6da9-24c74ff1f0cc" => "c_bbbb_aaaa_cfd3a140e06a914fc8469583ad87829d.dita", "GUID-96a20d7f-bbaf-deef-55ef-e09a0a059251" => "c_bbbb_6100b51ca1f789836cd4f31893ed67d2.dita", "GUID-cfe7cb3d-05e7-a147-db10-dcbacaeecef7" => "c_bbbb_bbbb_c90ebf976073b2a3f7a8dc27a3c8254b.dita", "p1" => "c_bbbb_6100b51ca1f789836cd4f31893ed67d2.dita", "p2" => "c_bbbb_bbbb_c90ebf976073b2a3f7a8dc27a3c8254b.dita", "p3" => "c_bbbb_cccc_d1c80714275637cde524bdfa1304a8f3.dita", }, }; }
Create files to test the various capabilities provided by Xref
Attributes used by the Xref cross referencer.
addNavTitles - If true, add navtitle to outgoing bookmap references to show the title of the target topic.
changeBadXrefToPh - Change xrefs being placed in M3 by fixBadRefs to ph.
deguidize - Set true to replace guids in dita references with file name. Given reference g1#g2/id convert g1 to a file name by locating the topic with topicId g2. This requires the guids to be genuinely unique. SDL guids are thought to be unique by language code but the same topic, translated to a different language might well have the same guid as the original topic with a different language code: =(de|en|es|fr). If the source is in just one language then the guid uniqueness is a reasonable assumption. If the conversion can be done in phases by language then the uniqueness of guids is again reasonably assured. Data::Edit::Xml::Lint provides an alternative solution to deguidizing by using labels to record the dita reference in the input corpus for each id encountered, these references can then be resolved in the usual manner by Data::Edit::Xml::Lint::relint.
fixBadRefs - Try to fix bad references in these files where possible by either changing a guid to a file name assuming the right file is present in the corpus being scanned and deguidize has been set true or failing that by moving the failing reference to the xtrf attribute i.e. placing it in M3 possibly renaming the tag to ph if changeBadXrefToPh is in effect.
fixDitaRefs - Fix references in a corpus of Dita documents that have been converted to the GB Standard and whose target structure has been written to the named folder.
fixRelocatedRefs - Fix references to topics that have been moved around in the out folder structure assuming that all file names are unique.
fixXrefsByTitle - Try to fix invalid xrefs by the Gearhart Title Method if true
flattenFolder - Files are renamed to the Gearhart standard and placed in this folder if set. References to the unflattened files are updated to references to the flattened files. This option will eventually be deprecated as the Dita::GB::Standard is now fully available allowing files to be easily flattened before being processed by Xref.
inputFolder - A folder containing the dita and ditamap files to be cross referenced.
matchTopics - Match topics by title and by vocabulary to the specified confidence level between 0 and 1. This operation might take some time to complete on a large corpus.
maxZoomIn - Optional hash of names to regular expressions to look for in each file
maximumNumberOfProcesses - Maximum number of processes to run in parallel at any one time with a sensible default.
printSummaryLine - Print the summary line if true - on by default.
reports - Reports folder: the cross referencer will write reports to files in this folder.
requestAttributeNameAndValueCounts - Report attribute name and value counts
allowUniquePartialMatches - Allow unique partial matches - i.e ignore the stuff to the right of the # in a reference if doing so produces a unique result. This feature has been explicitly disabled for conrefs (PS2-561) and might need to be disabled for other types of reference as well.
attributeCount - {file}{attribute name} == count of the different xml attributes found in the xml files.
attributeNamesAndValuesCount - {file}{attribute name}{value} = count
author - {file} = author of this file.
badGuidHrefs - Bad conrefs - all.
badNavTitles - Details of nav titles that were not resolved
badReferencesCount - The number of bad references encountered
badTables - Array of tables that need fixing.
badXml1 - [Files] with a bad xml encoding header on the first line.
badXml2 - [Files] with a bad xml doc type on the second line.
baseTag - Base Tag for each file
bookMapRefs - {bookmap full file name}{href}{navTitle}++ References from bookmaps to topics via appendix, chapter, bookmapref.
conRefs - {file}{href} Count of conref definitions in each file.
currentFolder - The current working folder used to make absolute file names from relative ones
docType - {file} == docType: the docType for each xml file.
duplicateIds - [file, id] Duplicate id definitions within each file.
duplicateTopicIds - [topicId, [files]] Files with duplicate topic ids - the id on the outermost tag.
fileExtensions - Default file extensions to load
fixRefs - {file}{ref} where the href or conref target is not valid.
fixedFolder - Fixed files are placed in this folder if fixBadRefs has been specified.
fixedRefs - [] hrefs and conrefs from fixRefs which were invalid but have been fixed by deguidizing them to a valid file name.
fixedRefsFailed - [] hrefs and conrefs from fixRefs which were moved to the "xtrf" attribute as requested by the fixBadHrefs attribute because the reference was invalid and could not be improved by deguidization.
fixedRefsGB - [] files fixed to the Gearhart-Brenan file naming standard
fixedRefsNoAction - [] hrefs and conrefs from fixRefs for which no action was taken.
flattenFiles - {old full file name} = file renamed to Gearhart-Brenan file naming standard
goodNavTitles - Details of nav titles that were resolved
guidHrefs - {file}{href} = location where href starts with GUID- and is thus probably a guid.
guidToFile - {topic id which is a guid} = file defining topic id.
hrefUrlEncoding - Hrefs that need url encoding because they contain white space
ids - {file}{id} Id definitions across all files.
images - {file}{href} Count of image references in each file.
imagesReferencedFromBookMaps - {bookmap full file name}{full name of image referenced from topic referenced from bookmap}++
imagesReferencedFromTopics - {topic full file name}{full name of image referenced from topic}++
improvements - Suggested improvements - a list of improvements that might be made.
inputFileToTargetTopics - {input file}{target file}++ : Tells us the topics an input file was split into
inputFiles - Input files from inputFolder.
inputFolderImages - {full image file name} for all files in input folder thus including any images resent
ltgt - {text between < and >}{filename} = count giving the count of text items found between < and >
maxZoomOut - Results from maxZoomIn where {file name}{regular expression key name in maxZoomIn}++
md5Sum - MD5 sum for each input file.
missingImageFiles - [file, href] == Missing images in each file.
missingTopicIds - Missing topic ids.
noHref - Tags that should have an href but do not have one.
notReferenced - {file name} Files in input area that are not referenced by a conref, image, bookmapref or xref tag and are not a bookmap.
olBody - The number of ol under body by file
originalSourceFileAndIdToNewFile - {original file}{id} = new file: Record mapping from original source file and id to the new file containing the id
otherMeta - {original file}{othermeta name}{othermeta content}++ : the contents of the other meta tags
otherMetaBookMapsAfterTopicIncludes - Bookmap othermeta after topic othermeta has been included
otherMetaBookMapsBeforeTopicIncludes - Bookmap othermeta before topic othermeta has been included
otherMetaDuplicatesCombined - Duplicate othermeta in bookmaps with called topics othermeta included
otherMetaDuplicatesSeparately - Duplicate othermeta in bookmaps and topics considered separately
otherMetaPushToBookMap - Othermeta that can be pushed to the calling book map
otherMetaRemainWithTopic - Othermeta that must stay in the topic
parseFailed - {file} files that failed to parse.
references - {file}{reference}++ - the various references encountered
relocatedReferencesFailed - Failing references that were not fixed by relocation
relocatedReferencesFixed - Relocated references fixed
results - Summary of results table.
sourceFile - The source file from which this structure was generated.
sourceTopicToTargetBookMap - {input topic cut into multiple pieces} = output bookmap representing pieces
statusLine - Status line summarizing the cross reference.
statusTable - Status table summarizing the cross reference.
tagCount - {file}{tags} == count of the different tag names found in the xml files.
tags - Number of tags encountered
tagsTextsRatio - Ratio of tags to text encountered
targetFolderContent - {file} = bookmap file name : the target folder content which shows us where an input file went
targetTopicToInputFiles - {current file} = the source file from which the current file was obtained
texts - Number of texts encountered
timeEnded - Time the run ended
timeStart - Time the run started
title - {file} = title of file.
titleToFile - {title}{file}++ if fixXrefsByTitle is in effect
topicFlattening - {topic}{sources}++ : the source files for each topic that was flattened
topicFlatteningFactor - Topic flattening factor - higher is better
topicIds - {file} = topic id - the id on the outermost tag.
topicsFlattened - Number of topics flattened
topicsReferencedFromBookMaps - {bookmap file, file name}{topic full file name}++
validationErrors - True means that Lint detected errors in the xml contained in the file.
vocabulary - The text of each topic shorn of attributes for vocabulary comparison.
xRefs - {file}{href}++ Xrefs references.
xrefBadFormat - External xrefs with no format=html.
xrefBadScope - External xrefs with no scope=external.
xrefsFixedByTitle - Xrefs fixed by locating a matching topic title from their text content.
Create a new cross referencer
Parameter Description 1 %attributes Attributes
Count has elements to the specified number of levels
Parameter Description 1 $l Levels 2 $h Hash
Check for an external reference
Parameter Description 1 $reference Reference to check
Load the names of the files to be processed
Parameter Description 1 $xref Cross referencer
Analyze one input file
Parameter Description 1 $Xref Xref request 2 $iFile File to analyze
Map and report guids to files
Parameter Description 1 $xref Xref results
Edit an xml file retaining any existing XML headers and lint trailers
Parameter Description 1 $in Input file 2 $out Output file 3 $source Source to write
Fix one file by moving unresolved references to the xtrf attribute
Parameter Description 1 $xref Xref results 2 $sourceFile Source file to fix
Fix just the file containing references using a number of techniques and report those references that cannot be so fixed.
Fix one file to the Gearhart-Brenan standard
Parameter Description 1 $xref Xref results 2 $file File to fix
Rename files to the GB Standard
Analyze the input files
Report duplicate ids
Report duplicate topic ids
Report locations where an href was expected but not found
Check each reference, report bad references and mark them for fixing.
Report on guid hrefs
Reports on images and references to images
Report failed parses
Report bad xml on line 1
Report bad xml on line 2
Report doc type count
Report tag counts
Report tags and texts counts
Report items found between < and >
Report attribute counts
Report attribute value counts
Report the files known to have validation errors
Report on tables that have problems
Report file extension counts
Report file type counts - takes too long in series
Report external xrefs missing other attributes
Report improvements possible
Text located via Max Zoom In
Things that occur once in each file
Count how frequently each topic is reused
Report of hrefs that need to be fixed
Source file for each topic
Topics and images referenced from bookmaps
Report topics likely to be similar on the basis of their titles as expressed in the non Guid part of their file names
Report topics likely to be similar on the basis of their vocabulary
Good files have short names which uniquely represent their content and thus can be used instead of their md5sum to generate unique names
ol under body - indicative of a task
href needs url encoding
Fix navtitles in one map
Add nav titles to files containing maps.
Check one reference and return the first error encountered or undef if no errors encountered. Relies on topicIds to test files present and test the topicId is valid, relies on ids to check that the referenced id is valid.
Parameter Description 1 $xref Cross referencer 2 $file File containing reference 3 $href Reference
Create sample input files for testing. The attribute inputFolder supplies the name of the folder in which to create the sample files.
Parameter Description 1 $in Input folder 2 $N Number of sample files
Create sample input files for testing fixFolder
Parameter Description 1 $in Folder to create the files in
Create sample input files for testing items between < and >
Create sample input files for fixing renamed topic refs
Parameter Description 1 $in Folder to create the files in 2 $targets Targets folder
Create sample input files for fixing references into renamed topics by xref
Change file structure to the current folder and write
Parameter Description 1 $f Data structure as a string 2 $D Target folder
Create sample input files for fixing references via the targets/ folder
Parameter Description 1 $folder Folder to switch to
Create sample input files for fixing conref references via the targets/ folder
Create sample input files for fixing bookmap references to topics that get cut into multiple pieces
Create sample input files for fixing bookmap reference to a topic that did not get cut into multiple pieces
Create sample input files for testing topic flattening ratio reporting
Full reference to a topic that has been flattened
References from a topic that has been cut out to a topic that has been cut out
Create sample data for othermeta reports
Parameter Description 1 $out Folder
Check an output structure produced by Xrf
Parameter Description 1 $x Cross references 2 $field Field to check 3 @folders Folders to suppress
Write the test for an Xref structure
Parameter Description 1 $x Cross referencer 2 $field Field 3 @folders Names of the folders to suppress
Test reference checking
1 addNavTitlesToMaps - Add nav titles to files containing maps.
2 addNavTitlesToOneMap - Fix navtitles in one map
3 analyzeInputFiles - Analyze the input files
4 analyzeOneFile - Analyze one input file
5 changeFolderAndWriteFiles - Change file structure to the current folder and write
6 checkReferences - Check each reference, report bad references and mark them for fixing.
7 checkXrefStructure - Check an output structure produced by Xrf
8 countLevels - Count has elements to the specified number of levels
9 createSampleImageTest - Create sample input files for fixing bookmap reference to a topic that did not get cut into multiple pieces
10 createSampleInputFiles - Create sample input files for testing.
11 createSampleInputFilesFixFolder - Create sample input files for testing fixFolder
12 createSampleInputFilesForFixDitaRefs - Create sample input files for fixing renamed topic refs
13 createSampleInputFilesForFixDitaRefsImproved1 - Create sample input files for fixing references via the targets/ folder
14 createSampleInputFilesForFixDitaRefsImproved2 - Create sample input files for fixing conref references via the targets/ folder
15 createSampleInputFilesForFixDitaRefsImproved3 - Create sample input files for fixing bookmap references to topics that get cut into multiple pieces
16 createSampleInputFilesForFixDitaRefsImproved4 - Create sample input files for fixing bookmap reference to a topic that did not get cut into multiple pieces
17 createSampleInputFilesForFixDitaRefsXref - Create sample input files for fixing references into renamed topics by xref
18 createSampleInputFilesLtGt - Create sample input files for testing items between < and >
19 createSampleOtherMeta - Create sample data for othermeta reports
20 createTestReferencedToFlattenedTopic - Full reference to a topic that has been flattened
21 createTestReferenceToCutOutTopic - References from a topic that has been cut out to a topic that has been cut out
22 createTestTopicFlattening - Create sample input files for testing topic flattening ratio reporting
23 editXml - Edit an xml file retaining any existing XML headers and lint trailers
24 externalReference - Check for an external reference
25 fixFilesGB - Rename files to the GB Standard
26 fixOneFileGB - Fix one file to the Gearhart-Brenan standard
27 fixReferences - Fix just the file containing references using a number of techniques and report those references that cannot be so fixed.
28 fixReferencesInOneFile - Fix one file by moving unresolved references to the xtrf attribute
29 loadInputFiles - Load the names of the files to be processed
30 newXref - Create a new cross referencer
31 oneBadRef - Check one reference and return the first error encountered or undef if no errors encountered.
32 reportAttributeCount - Report attribute counts
33 reportAttributeNameAndValueCounts - Report attribute value counts
34 reportDocTypeCount - Report doc type count
35 reportDuplicateIds - Report duplicate ids
36 reportDuplicateTopicIds - Report duplicate topic ids
37 reportExternalXrefs - Report external xrefs missing other attributes
38 reportFileExtensionCount - Report file extension counts
39 reportFileTypes - Report file type counts - takes too long in series
40 reportFixRefs - Report of hrefs that need to be fixed
41 reportGuidHrefs - Report on guid hrefs
42 reportGuidsToFiles - Map and report guids to files
43 reportHrefUrlEncoding - href needs url encoding
44 reportImages - Reports on images and references to images
45 reportLtGt - Report items found between < and >
46 reportMaxZoomOut - Text located via Max Zoom In
47 reportMd5Sum - Good files have short names which uniquely represent their content and thus can be used instead of their md5sum to generate unique names
48 reportNoHrefs - Report locations where an href was expected but not found
49 reportOlBody - ol under body - indicative of a task
50 reportParseFailed - Report failed parses
51 reportPossibleImprovements - Report improvements possible
52 reportReferencesFromBookMaps - Topics and images referenced from bookmaps
53 reportSimilarTopicsByTitle - Report topics likely to be similar on the basis of their titles as expressed in the non Guid part of their file names
54 reportSimilarTopicsByVocabulary - Report topics likely to be similar on the basis of their vocabulary
55 reportSourceFiles - Source file for each topic
56 reportTables - Report on tables that have problems
57 reportTagCount - Report tag counts
58 reportTagsAndTextsCount - Report tags and texts counts
59 reportTopicDetails - Things that occur once in each file
60 reportTopicReuse - Count how frequently each topic is reused
61 reportValidationErrors - Report the files known to have validation errors
62 reportXml1 - Report bad xml on line 1
63 reportXml2 - Report bad xml on line 2
64 testReferenceChecking - Test reference checking
65 writeXrefStructureTest - Write the test for an Xref structure
66 xref - Check the cross references in a set of Dita files held in inputFolder and report the results in the reports folder.
This module is written in 100% Pure Perl and, thus, it is easy to read, comprehend, use, modify and install via cpan:
sudo cpan install Data::Edit::Xml::Xref
philiprbrenan@gmail.com
http://www.appaapps.com
Copyright (c) 2016-2019 Philip R Brenan.
This module is free software. It may be used, redistributed and/or modified under the same terms as Perl itself.
1 POD Error
The following errors were encountered while parsing the POD:
=pod directives shouldn't be over one line long! Ignoring all 2 lines of content
To install Data::Edit::Xml::Xref, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Data::Edit::Xml::Xref
CPAN shell
perl -MCPAN -e shell install Data::Edit::Xml::Xref
For more information on module installation, please visit the detailed CPAN module installation guide.