The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Data::Cmp - Compare two data structures, return -1/0/1 like cmp

VERSION

This document describes version 0.001 of Data::Cmp (from Perl distribution Data-Cmp), released on 2018-08-10.

SYNOPSIS

 use Data::Cmp qw(cmp_data);

 cmp_data(["one", "two", "three"],
          ["one", "two", "three"]); # => 0

 cmp_data(["one", "two" , "three"],
          ["one", "two2", "three"]); # => -1

 cmp_data(["one", "two", "three"],
          ["one", "TWO", "three"]); # => 1

 # case insensitive string comparison
 cmp_data(["one", "two", "three"],
          ["one", "TWO", "three"], {ci=>1}); # => 0

 # approximate number comparison
 cmp_data([1, 1.5    , 1.6],
          [1, 1.49999, 1.6], {epsilon=>1e-4}); # => 0

 cmp_data(["one", "two", {}],
          ["one", "TWO", "three"]); # => 1

 # hash/array is not "comparable" with scalar
 cmp_data(["one", "two", {}],
          ["one", "two", "three"]); # => 2

 # so is hash and array
 cmp_data([],
          {}); # => 2

 # custom comparison function: always return the same
 cmp_data(["one" , "two", "three"],
          ["satu", "dua", 3], {elem_cmp=>sub {0}}); # => 0

 # custom comparison function: compare length ("satu" is longer than "one")
 cmp_data(["one" , "two", "three"],
          ["satu", "dua", "tiga" ], {elem_cmp=>sub { length $_[0] <=> length $_[1] }}); # => -1

DESCRIPTION

This module offers the cmp_data function that can compare two data structures in a flexible manner. The function can return a ternary value -1/0/1 like Perl's cmp or <=> operator (or another value 2, if the two data structures differ but there is no sensible notion of which one is larger than the other).

This module can handle circular structure.

This module offers an alternative to Test::Deep (specifically, Test::Deep::NoDeep's is_deeply()). Test::Deep allows customizing comparison on specific points in a data structure, while Data::Cmp's cmp_data() is more geared towards customizing comparison behavior across all points in a data structure. Depending your needs, one might be more convenient than the other.

For basic customization, you can turn on case-sensitive matching or numeric tolerance. For more advanced customization, you can provide coderefs to perform comparison of data items yourself.

FUNCTIONS

cmp_data

Usage:

 cmp_data($d1, $d2 [ , \%opts ]) => -1|0|1|2

Compare two data structures $d1 and $d2 recursively. Like the cmp operator, will return either: 0 if the two structures are equivalent, -1 if $d1 is "less than" $d2, 1 if $d1 is "greater than" $d2. Unlike the cmp operator, can also return 2 if $d1 and $d2 differ but there is no sensible notion of which one is "greater than" the other.

Can detect recursive references.

Default behavior when comparing different types of data:

  • Two undef values are the same (0)

  • Defined value is greater than undefined value

     cmp_data(undef, 0); # -1
  • Two numbers will be compared using Perl's <=> operator

    Whether data is a number will be determined using Scalar::Util's looks_like_number.

     cmp_data("10", 9); # 1
  • Strings or number vs string will be compared using Perl's cmp operator

     cmp_data("a", "2b"); # 1
  • Two arrays will be compared element by element

    If all elements are the same until the last element of the shorter array, the longer array is greater than the shorter one.

     cmp_data([1,2,3], [1,3,2]); # -1
    
     cmp_data([1,2,3], [1,2]); # 1
     cmp_data([1,2,3], [1,2,3,0]); # -1
  • Two hashes will be compared key by key (sorted ascibetically)

    If after all common keys are compared all values are the same, the hash with more extra keys are greater than the other one; if they have the same number of extra keys, they are different; if they both have no extra keys, they are the same.

     cmp_data({a=>1, b=>2}, {a=>1, b=>2}); # 0
     cmp_data({a=>1, b=>2}, {a=>1, b=>3}); # -1
    
     cmp_data({a=>1, b=>2}, {a=>1}); # 1
     cmp_data({a=>1, b=>2}, {a=>1, c=>1}); # 2
    
     cmp_data({a=>1, b=>2}, {a=>1, c=>1, d=>1}); # -1
  • All other combination will result in either 0 (same) or 2 (different)

Known options:

  • ci

    Boolean. Can be set to true to turn on case-insensitive string comparison.

  • tolerance

    Float. Can be set to perform numeric comparison with some tolerance.

  • cmp

    Coderef. Can be set to provide custom comparison routine.

    The coderef will be called for every data item (container included e.g. hash and array, before diving down to their items) and given these arguments:

     ($item1, $item2, \%context)

    Context contains these keys: depth (int, starting from 0 from the topmost level).

    Must return 0, -1, 1, or 2. You can also return undef if you want to decline doing comparison. In that case, cmp_data() will use its default comparison logic.

    When using this option, ci and tolerance options do not take effect.

  • elem_cmp

    Coderef. Just like cmp option, except this routine will only be consulted for array elements or hash pair value.

  • num_cmp

    Coderef. Just like cmp option, except this routine will only be consulted two compared two defined numbers.

  • str_cmp

    Coderef. Just like cmp option, except this routine will only be consulted two compared two defined strings.

HOMEPAGE

Please visit the project's homepage at https://metacpan.org/release/Data-Cmp.

SOURCE

Source repository is at https://github.com/perlancar/perl-Data-Cmp.

BUGS

Please report any bugs or feature requests on the bugtracker website https://rt.cpan.org/Public/Dist/Display.html?Name=Data-Cmp

When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.

SEE ALSO

Modules that just return boolean result ("same or different"): Data::Compare, Test::Deep::NoTest (offers flexibility or approximate or custom comparison).

Modules that return some kind of "diff" data: Data::Comparator, Data::Diff.

Of course, to check whether two structures are the same you can also serialize each one then compare serialized strings/bytes. There are many modules for serialization: JSON, YAML, Sereal, Data::Dumper, Storable, Data::Dmp, just to name a few.

AUTHOR

perlancar <perlancar@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2018 by perlancar@cpan.org.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.