The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Data::Cmp - Compare two data structures, return -1/0/1 like cmp

VERSION

This document describes version 0.010 of Data::Cmp (from Perl distribution Data-Cmp), released on 2021-04-12.

SYNOPSIS

 use Data::Cmp qw(cmp_data);

 cmp_data(["one", "two", "three"],
          ["one", "two", "three"]); # => 0

 cmp_data(["one", "two" , "three"],
          ["one", "two2", "three"]); # => -1

 cmp_data(["one", "two", "three"],
          ["one", "TWO", "three"]); # => 1

 # hash/array is not "comparable" with scalar
 cmp_data(["one", "two", {}],
          ["one", "two", "three"]); # => 2

Sort data structures (of similar structures):

 my @arrays = (["c"], ["b"], ["a", "b"], ["a"], ["a","c"]);
 my @sorted = sort { cmp_data($a, $b) } @arrays; # => (["a"], ["a","b"], ["a","c"], ["b"], ["c"])

DESCRIPTION

This relatively lightweight (no non-core dependencies, under 100 lines of code) module offers the cmp_data function that, like Perl's cmp, returns -1/0/1 value. cmp_data differs from cmp in that it can compare two data of different types and compare data items recursively, with pretty sensible semantics. In addition to returning -1/0/1, cmp_data can also return 2 if two data differ but not comparable: there is no sensible notion of which one is "greater than" the other. An example is empty hash {} vs empty array []).

This module can handle circular structure.

The following are the rules of comparison used by cmp_data():

  • Two undefs are the same

     cmp_data(undef, undef); # 0
  • A defined value is greater than undef

     cmp_data(undef, 0); # -1
  • Two non-reference scalars are compared string-wise using Perl's cmp

     cmp_data("a", "A"); # 1
     cmp_data(10, 9);    # -1
  • A reference and non-reference are different and not comparable

     cmp_data([], 0); # 2
  • Two references that are of different types are different and not comparable

     cmp_data([], {}); # 2
  • Blessed references that are blessed into different packages are different and not comparable

     cmp_data(bless([], "foo"), bless([], "bar")); # 2
     cmp_data(bless([], "foo"), bless([], "foo")); # 0
  • Two array references are compared element by element (unless at least one of the arrayref has been seen, in which case see last rule)

     cmp_data(["a","b","c"], ["a","b","c"]); #  0
     cmp_data(["a","b","c"], ["a","b","d"]); # -1
     cmp_data(["a","d","c"], ["a","b","e"]); #  1
  • A longer arrayref is greater than its shorter subset

     cmp_data(["a","b"], ["a"]); # 1
  • Two hash references are compared key by key (unless at least one of the hashref has been seen, in which case see last rule)

     cmp_data({k1=>"a", k2=>"b", k3=>"c"}, {k1=>"a", k2=>"b", k3=>"c"}); # 0
     cmp_data({k1=>"a", k2=>"b", k3=>"c"}, {k1=>"a", k2=>"b", k3=>"d"}); # 1
  • When two hash references share a common subset of pairs but have non-common pairs, the greater hashref is the one that has more non-common pairs

    If the number of non-common pairs are the same, they are just different and not comparable:

     cmp_data({k1=>"", k2=>"", k3=>""}, {k1=>"", k5=>""});                #  1 (hash1 has 2 non-common keys: k2 & k3; hash2 only has 1: k5)
     cmp_data({k1=>"", k2=>"", k3=>""}, {k1=>"", k5=>"", k6=>", k7=>""}); # -1 (hash1 has 2 non-common keys: k2 & k3; hash2 has 3 non-common pairs: k5, k6, k7)
     cmp_data({k1=>"", k2=>"", k3=>""}, {k1=>"", k5=>"", k6=>"});         #  2 (both hashes have 2 non-common pairs)
  • All other types of references (i.e. non-hash, non-array) are the same only if their address is the same; otherwise they are different and not comparable

     cmp_data(\1, \1); # 2
     my $ref = \1; cmp_data($ref, $ref); # 0
  • A seen (hash or array) reference is no longer recursed, it's compared by address (see previous rule)

     my $ary1 = [1]; push @$ary1, $ary1;
     my $ary2 = [1]; push @$ary2, $ary2;
     my $ary3 = [1]; push @$ary3, $ary1;
     cmp_data($ary1, $ary2); # 2
     cmp_data($ary1, $ary3); # 0

FUNCTIONS

cmp_data

Usage:

 cmp_data($d1, $d2) => -1/0/1/2

HOMEPAGE

Please visit the project's homepage at https://metacpan.org/release/Data-Cmp.

SOURCE

Source repository is at https://github.com/perlancar/perl-Data-Cmp.

BUGS

Please report any bugs or feature requests on the bugtracker website https://github.com/perlancar/perl-Data-Cmp/issues

When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.

SEE ALSO

Data comparison

Other variants of Data::Cmp: Data::Cmp::Numeric, Data::Cmp::StrOrNumeric, Data::Cmp::Custom (allows custom actions and comparison routines), Data::Cmp::Diff (generates diff structure instead of just returning -1/0/1/2), Data::Cmp::Diff::Perl (generates diff in the form of Perl code).

Modules that just return boolean result ("same or different"): Data::Compare, Test::Deep::NoTest (offers flexibility or approximate or custom comparison).

Modules that return some kind of "diff" data: Data::Comparator, Data::Diff.

Of course, to check whether two structures are the same you can also serialize each one then compare the serialized strings/bytes. There are many modules for serialization: JSON, YAML, Sereal, Data::Dumper, Storable, Data::Dmp, just to name a few.

Test modules that do data structure comparison: Test::DataCmp (test module based on Data::Cmp::Custom), Test::More (is_deeply()), Test::Deep, Test2::Tools::Compare.

Others

Scalar::Cmp which employs roughly the same rules as Data::Cmp but does not recurse into arrays/hashes and is meant to compare two scalar values.

AUTHOR

perlancar <perlancar@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2021, 2019, 2018 by perlancar@cpan.org.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.