The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Data::Match - Complex data structure pattern matching

SYNOPSIS

  use Data::Match qw(:all);
  my ($match, $results) = match($structure, $pattern);

  use Data::Match;
  my $obj = new Data::Match;
  my ($match, $results) = $obj->execute($structure, $pattern);

DESCRIPTION

Data::Match provides extensible complex Perl data structure searching and matching.

EXPORT

None are exported by default. :func exports match and matches, :pat exports all the pattern element generators below, :all exports :func and :pat.

PATTERNS

A data pattern is a complex data structure that possibly matches another complex data structure. For example:

  matches([ 1, 2 ], [ 1, 2 ]); # TRUE

  matches([ 1, 2, 3 ], [ 1, ANY, 3 ]); # TRUE

  matches([ 1, 2, 3 ], [ 1, ANY, 2 ]); # FALSE: 3 != 2

ANY matches anything, including an undefined value.

  my $results = matches([ 1, 2, 1 ], [ BIND('x'), ANY, BIND('x') ]); # TRUE

BIND($name) matches anything and remembers each match and its position with every BIND($name) in $result-{'BIND'}{$name}>. If BIND($name) is not the same as the first value bound to BIND($name) it does not match. For example:

  my $results = matches([ 1, 2, 3 ], [ BIND('x'), 2, BIND('x') ]); # FALSE: 3 != 1

COLLECT($name) is similar to BIND but does not compare first bound values.

REST matches all remaining elements of an array or hash.

  matches([ 1, 2, 3 ], [ 1, REST() ]); # TRUE
  matches({ 'a'=>1, 'b'=>1 }, { 'b'=>1, REST() => REST() }); # TRUE

FIND searches at all depths for matching subpatterns.

  matches([ 1, [ 1, 2 ], 3], FIND(COLLECT('x', [ 1, REST() ])); # is true.

See the test script t/t1.t in the package distribution for more pattern examples.

MATCH COLLECTIONS

When a BIND or COLLECT matches a datum, an entry is collected in $result-{BIND}> and $result-{COLLECT}>, respectively. (This might change in the future)

Each entry for the binding name is a hash containing 'v', 'p' and 'ps' lists. 'v' is a list of the value at each match. 'p' is a list describing where the corresponding match was found based on the root of the searchat each match. 'ps' is a list of code strings that describes where the match was for each match.

SUBPATTERNS

All patterns can have sub-patterns. Most patterns match the ANDed results of their sub-patterns and their own behavior, first trying the sub-patterns before attempting to match the intrinsic behavior. However, OR and ANY match any sub-patterns;

For example:

  match([ ['a', 1 ], ['b', 2], ['a', 3] ], EACH(COLLECT('x', ['a', ANY())))) # TRUE

The above pattern means:

    For EACH element in the root structure (an array):

      COLLECT each element, in 'x', that is,

        An ARRAY of length 2 that starts with 'a'.

On the other hand.

  match( [ ['a', 1 ], ['b', 2], ['a', 3] ], ALL(COLLECT('x', [ 'a', ANY() ])) ) 
  # IS FALSE

Because the second root element (an array) does not start with 'a'. But,

  match( [ ['a', 1 ], ['a', 2], ['a', 3] ], ALL(COLLECT('x', [ 'a', ANY() ])) ) 
  # IS TRUE

The pattern below essentially does a flatten of the nested array:

  match(
    [ 1, 'x', 
      [ 2, 'x', 
        [ 3, 'x'], 
        [ 4, 
           [ 5, 
             [ 'x' ] 
           ],
          6
        ] 
      ] 
    ], 
    FIND(COLLECT('x', EXPR(q{! ref}))), 
    { 'no_collect_path' => 1 }
  );

no_collect_path causes COLLECT and BIND to not collect any paths.

MATCH SLICES

Match slices are objects that contain slices of matched portions of a data structure. This is useful for inflicting change into substructures matched by patterns like REST.

For example:

  do {
    my $a = [ 1, 2, 3, 4 ];
    my $p = [ 1, ANY, REST(BIND('s')) ];
    my $r = matches($a, $p); # 
    ok($r); # TRUE
    ok(Compare($r->{'BIND'}{'s'}{'v'}[0], [ 3, 4 ])); # TRUE
    $p->[0] = 'x';
    matches($a, [ 1, 2, 'x', 4 ]); # TRUE
  }

Hash match slices are generate for each key-value pair for a hash matched by EACH and ALL. Each of these match slices can be matched as a hash with a single key->value pair.

Match slices are useful for search and replace missions.

VISITATION ADAPTERS

By default Data::Match is blind to Perl object interfaces. To instruct Data::Match to not traverse object implementation containers and honor object interfaces you must provide a visitation adaptor. A visitation adaptor tells Data::Match what to traverse through an object interface and how to keep track of how it got through.

For example:

  package Foo;
  sub new
  {
    my ($cls, %opts) = @_;
    bless \%opts, $cls;
  }
  sub x { shift->{x}; }
  sub parent { shift->{parent}; }
  sub children { shift->{children}; }
  sub add_child { 
    my $self = shift; 
    for my $c ( @_ ) { 
      $c->{parent} = $self;
    }
    push(@{$self->{children}}, @_);
  }


  my $foos = [ map(new Foo('x' => $_), 1 .. 10) ];
  for my $f ( @$foos ) { $f->add_child($foos->[rand($#$foo)); }

  my $pat = FIND(COLLECT('Foo', ISA('Foo', { 'parent' => $foos->[0], REST() => REST() })));
  $match->match($foos, $pat);

The problem with the above example is: FIND will not honor the interface of class Foo by default and will eventually find a Foo where $_-parent eq $foos->[0]> through all the parent and child links in the objects' implementation container. To force Data::Match to honor an interface (or a subset of an interface) during FIND traversal we create a 'find' adapter sub that will do the right thing.

  my $opts = {
    'find' => {
       'Foo' => sub {
         my ($self, $visitor) = @_;

         # Always do 'x'.
         $visitor->($self->x, 'METHOD', 'x');

         # Optional children traversal.
         if ( $match->{'Foo_traverse_children'} ) {
           $visitor->($self->children, 'METHOD', 'children');
         }

         # Optional parent traversal.
         if ( $match->{'Foo_traverse_parent'} ) {
           $visitor->($self->parent, 'METHOD', 'parent');
         }
       }
     }
  }
  my $match = new Data::Match($opts);
  $match = $match->match($foos, $pat);

See t/t4.t for more examples of visitation adaptors.

DESIGN

Data::Match employs a mostly-functional external interface since this module was inspired by a Lisp tutorial ("The Little Lisper", maybe) I read too many years ago; besides, pattern matching is largely recursively functional. The optional control hashes and traverse adapter interfaces are better represented by an object interface so implemented a functional veneer over the core object interface.

Internally, objects are used to represent the pattern primitives because most of the pattern primitives have common behavior. There are a few design patterns that are particularly applicable in Data::Match: Visitor and Adapter. Adapter is used to provide the extensiblity for the traversal of blessed structures such that Data::Match can honor the external interfaces of a class and not blindly violate encapsulation. Visitor is the basis for some of the FIND pattern implementation. The Data::Match::Slice classes that provide the match slices are probably a Veneer on the array and hash types through the tie meta-behaviors.

CAVEATS

  • Does not have regex-like operators like '?', '*', '+'.

  • Should probably have more interfaces with Data::DRef and Data::Walker.

  • The visitor adaptors do not use UNIVERSAL::isa to search for the adapter is uses ref.

  • Since hash keys do not retain blessedness (what was Larry thinking?) it is difficult to have patterns match keys without resorting to some bizarre regex instead of using isa.

STATUS

If you find this to be useful please contact the author. This is alpha software; all APIs, semantics and behavors are subject to change.

INTERFACE

This section describes the external interface of this module.

%match_opts

Default options for match.

_match_pre

Initialize the match object before pattern traversal.

_match_post

Initialize the match object before pattern traversal.

execute

Matches a structure against a pattern. In a list context, returns both the match success and results; in a scalar context returns the results hash if match succeeded or undef.

  use Data::Match;
  my $obj = new Data::Match();
  my $matched = $obj->execute($thing, $pattern);

match

   match($thing, $pattern, @opts)

it same as:

   Data::Match->new(@opts)->execute($thing, $pattern);

matches

Same as match in scalar context.

match_path_str

Returns a perl expression that will generate code to point to the element of the path.

match_path_DRef_path

Returns a string suitable for Data::DRef.

match_path_get

Returns the value pointing to the location for the match path in the root.

  my $results = matches($thing, FIND(BIND('x', [ 'x', REST ])));
  my $x = match_path_get($thing, $results->{'BIND'}{'x'}{'p'}[0]);

The above example returns the first array that begins with 'x'.

match_path_set

Returns the value pointing to the location for the match path in the root.

  my $results = matches($thing, FIND(BIND('x', [ 'x', REST ])));
  match_path_set($thing, $results->{'BIND'}{'x'}{'p'}[0], 'y');

The above example replaces the first array found that starts with 'x' with 'y';

match_path_ref

Returns a scalar ref pointing to the location for the match path in the root.

  my $results = matches($thing, FIND(BIND('x', [ 'x', REST ])));
  my $ref = match_path_ref($thing, $results->{'BIND'}{'x'}{'p'}[0]);
  $$ref = 'y';

The above example replaces the first array that starts with 'x' with 'y';

VERSION

Version 0.03, $Revision: 1.8 $.

AUTHOR

Kurt A. Stephens <kurtstephens@acm.org>

COPYRIGHT

Copyright (c) 2001, Kurt A. Stephens and ION, INC.

SEE ALSO

perl, Array::PatternMatcher, Data::Compare, Data::Dumper, Data::DRef, Data::Walker.