NAME
XML::Similarity - Calculate the structural similarity between two XML documents
SYNOPSIS
use
XML::Similarity;
my
$hs
= new XML::Similarity;
my
$a
=
"<html><body></body></html>"
;
my
$b
=
"<html><body><h1>HOMEPAGE</h1><h2>Details</h2></body></html>"
;
my
$score
=
$hs
->calculate_similarity(
$a
,
$b
);
"Similarity: $score\n"
;
DESCRIPTION
This module is a small and handy tool to calculate structural similarity between any two XML documents. The underlying algorithm is quite simple and straight-forward. It serializes two XML tree to two arrays containing node's tag names and finds the longest common sequence between the two serialized arrays.
The similarity is measured with the formula (2 * LCS' length) / (treeA's length + treeB's length).
Structural similarity can be useful for XML document classification and clustering.
PREREQUISITE
COPYRIGHT
Copyright (c) 2011 Yung-chung Lin.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.