The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Data::DRef - Delimited-key access to complex data structures

SYNOPSIS

  use Data::DRef qw( :dref_access );
  my $hash = { 'items' => [ 'first' ] };
  print get_value_for_dref($hash, 'items.0');
  set_value_for_dref( $hash, 'items.1', 'second' );
  
  set_value_for_root_dref( 'myhash', $hash );    
  print get_value_for_root_dref('myhash.items.0');

  use Data::DRef qw( :select );
  matching_keys($target, %filter_criteria) : $key or @keys
  matching_values($target, %filter_criteria) : $item or @items

  use Data::DRef qw( :index );
  index_by_drefs($target, @drefs) : $index
  unique_index_by_drefs($target, @drefs) : $index
  ordered_index_by_drefs( $target, $index_dref ) : $entry_ary
  
  use Data::DRef qw( :leaf );
  leaf_drefs($target) : @drefs
  leaf_values( $target ) : @values
  leaf_drefs_and_values( $target ) : %dref_value_pairs

DESCRIPTION

Data::DRef provides a streamlined interface for accessing values within nested Perl data structures. These structures are generally networks of hashes and arrays, some of which may be blessed into various classes, containing a mix of simple scalar values and references to other items in the structure.

The Data::DRef functions allow you to use delimited key strings to set and retrieve values at desired nodes within these structures. These functions are slower than direct variable access, but provide additional flexibility for high-level scripting and other late-binding behaviour. For example, a web-based application could use DRefs to simplify customization, allowing the user to refer to arguments processed by CGI.pm in fairly readable way, such as query.param.foo.

A suite of utility functions, previous maintained in a separate Data::Collection module, performs a variety of operations across nested data structures. Because the Data::DRef abstraction layer is used, these functions should work equally well with arrays, hashes, or objects that provide their own key-value interface.

REFERENCE

Value-For-Key Interface

The first set of functions define our core key-value interface, and provide its implementation for references to Perl arrays and hashes. For example, direct access to array and hash keys usually looks like this:

    print $employee->[3];
    $person->{'name'} = 'Joe';

Using these functions, you could replace the above statements with:

    print get_value_for_key( $employee, 3 );
    set_value_for_key( $person, 'name', 'Joe' );

Each of these functions checks for object methods as described below.

get_keys($target) : @keys

Returns a list of keys for which this item would be able to provide a value. For hash refs, returns the hash keys; for array refs, returns a list of numbers from 0 to $#; otherwise returns nothing.

get_values($target) : @values

Returns a list of values for this item. For hash refs, returns the hash values; for array refs, returns the array contents; otherwise returns nothing.

get_value_for_key($target, $key) : $value

Returns the value associated with this key. For hash refs, returns the value at this key, if present; for array refs, returns the value at this index, or complains if it's not numeric.

set_value_for_key($target, $key, $value)

Sets the value associated with this key. For hash refs, adds or overwrites the entry for this key; for array refs, sets the value at this index, or complains if it's not numeric.

get_or_create_value_for_key($target, $key) : $value

Gets value associated with this key using get_value_for_key, or if that value is undefined, sets the value to refer to a new anonymous hash using set_value_for_key and returns that reference.

get_reference_for_key($target, $key) : $value_reference

Returns a reference to the scalar which is used to hold the value associated with this key.

Multiple-Key Chaining

Frequently we wish to access values at some remove within a structure by chaining through a list of references. Programmatic access to these values within Perl usually looks something like this:

    print $report->{'employees'}[3]{'id'};
    $report->{'employees'}[3]{'name'} = 'Joe';

Using these functions, you could replace the above statements with:

    print get_value_for_keys( $report, 'employees', 3, 'id' );
    set_value_for_keys( $report, 'Joe', 'employees', 3, 'name' );

These functions also support the "m_*" method delegation described above.

get_value_for_keys($target, @keys) : $value

Starting at the target, look up each of the provided keys sequentially from the results of the previous one, returning the final value. Return value is undefined if at any time we find a key for which no value is present.

set_value_for_keys($target, $value, @keys)

Starting at the target, look up each of the provided keys sequentially from the results of the previous one; when we reach the final key, use set_value_for_key to make the assignment. If an intermediate value is undefined, replaces it with an empty hash to hold the next key-value pair.

get_or_create_value_for_keys($target, @keys) : $value

As above.

get_reference_for_keys($target, @keys) : $val_ref

As above.

Object Overrides

Each of the value-for-key and multiple-key functions first check for methods with similar names preceeded by "m_" and, if present, uses that implementation. For example, callers can consistently request get_value_for_key($foo, $key), but in cases where $foo supports a method named m_get_value_for_key, its results will be returned instead.

Classes that wish to provide alternate DRef-like behavior or generate values on demand should implement these methods in their packages. A Data::DRef::MethodBased class is provided for use by objects which use methods to get and set attributes. By making your package a subclass of MethodBased you'll inherit m_get_value_for_key and m_set_value_for_key methods which treat the key as a method name to invoke.

DRef Syntax

In order to simplify expression of the lists of keys used above, we define a string format in which they may be represented. A DRef string is composed of a series of simple scalar keys, each escaped with String::Escape's printable() function, joined with the $Separator character, '.'.

$Separator

The multiple-key delimiter character, by default ., the period character.

get_key_drefs($target) : @drefs

Uses get_keys to determine the available keys for this target, and then returns an appropriately-escaped version of each of them.

dref_from_keys( @keys ) : $dref

Escapes and joins the provided keys to create a dref string.

keys_from_dref( $dref ) : @keys

Splits and unescapes a dref string to its consituent keys.

join_drefs( @drefs ) : $dref

Joins already-escaped dref strings into a single dref.

unshift_dref_key( $dref, $key )

Modify the provided dref string by escaping and prepending the provided key. Note that the original $dref variable is altered.

shift_dref_key( $dref ) : $key

Modify the provided dref string by removing and unescaping the first key. Note that the original $dref variable is altered, and set to '' when the last key is removed.

DRef Pragmas

Several types of parenthesized expressions are supported as extension mechanisms for dref strings. Nested parentheses are supported, with the innermost parentheses resolved first.

Continuing the above example, one could write:

    set_value_for_root_dref('empl_number', 3);
    ...
    print get_value_for_dref($report, 'employees.(#empl_number).name');
resolve_pragmas( $dref_with_embedded_parens ) : $dref
resolve_pragmas( $dref_with_embedded_parens ) : ($dref, %options)

Calling resolve_pragmas() causes these expressions to be evaluated, and an expanded version of the dref is returned. In a list context, also returns a list of key-value pairs that may contain pragma information.

(#dref)

Parenthesized expressions begining with $DRefPrefix, the "#" character by default, are replaced with the Root-relative value for that dref using get_value_for_root_dref().

(!flag)

A flag indicating some optional or accessory behavior. Removed from the string. Sets $options{flag} to 1.

DRef Access

These functions provide the main public interface for dref-based access to values in nested data structures. They invoke the equivalent ..._value_for_keys() function after expanding and spliting the provided drefs.

Using these functions, you could replace the above statements with:

    print get_value_for_dref( $report, 'employees.3.id' );
    set_value_for_dref( $report, 'employees.3.name', 'Joe' );
get_value_for_dref($target, $dref) : $value

Resolve pragmas and split the provided dref, then use get_value_for_keys to look those keys up starting with target.

set_value_for_dref($target, $dref, $value)

Resolve pragmas and split the provided dref, then use set_value_for_keys.

Shared Data Graph Entry

Data::DRef also provides a common point-of-entry datastructure, refered to as $Root. Objects or structures accessible through $Root can be refered to identically from any package using the get_value_for_root_dref and set_value_for_root_dref functions. Here's another example:

    set_value_for_root_dref('report', $report);
    print get_value_for_root_dref('report.employees.3.name');
$Root

The data graph entry point, by default a reference to an anonymous hash.

get_value_for_root_dref($dref) : $value

Returns the value for the provided dref, starting at the root.

set_value_for_root_dref($dref, $value) : $value

Sets the value for the provided dref, starting at the root.

get_value_for_optional_dref($literal_or_prefixed_dref) : $value

If the argument begins with $DRefPrefix, the "#" character by default, the remainder is passed through get_value_for_root_dref(); otherwise it is returned unchanged.

Select by DRefs

The selection functions extract and return elements of a collection by evaluating them against a provided hash of criteria. When called in a scalar context, they will return the first sucessful match; in a list context, they will return all sucessful matches.

The keys in the criteria hash are drefs to check for each candidate; a match is sucessful if for each of the provided drefs, the candidate returns the same value that is associated with that dref in the criteria hash. To check the value itself, rather than looking up a dref, use undef as the hash key.

matching_keys($target, %dref_value_criteria_pairs) : $key or @keys

Returns keys of the target whose corresponding values match the provided criteria.

matching_values($target, %dref_value_criteria_pairs) : $item or @items

Returns values of the target which match the provided criteria.

Index by DRefs

The indexing functions extract the values from some target structure, then return a new structure containing references to those same values.

index_by_drefs($target, @drefs) : $index

Generates a hash, or series of nested hashes, of arrays containing values from the target. A single dref argument produces a single-level index, a hash which maps each value obtained to an array of values which returned them; multiple dref arguments create nested hashes.

unique_index_by_drefs($target, @drefs) : $index

Similar to index_by_drefs, except that only the most-recently visited single value is stored at each point in the index, rather than an array.

ordered_index_by_drefs( $target, $index_dref ) : $entry_ary

Constructs a single-level index while preserving the order in which top-level index keys are discovered. An array of hashes is returned, each containing one of the index keys and the array of associated values.

DRefs to Leaf nodes

These functions explore all of the references in the network of structures accessible from some starting point, and provide access to the outermost (non-reference) items. For a tree structure, this is equivalent to listing the leaf nodes, but these functions can also be used in structures with circular references.

leaf_drefs($target) : @drefs

Returns a list of drefs to the outermost values.

leaf_values( $target ) : @values

Returns a list of the outermost values.

leaf_drefs_and_values( $target ) : %dref_value_pairs

Returns a flat hash of the outermost drefs and values.

Compatibility

To provide compatibility with earlier versions of this module, many of the functions above are also accesible through an alias with the old name.

EXAMPLES

Here is a sample data structure which will be used to illustrate various example function calls. Note that the individual hashes shown below are only refered to in the following example results, not completely copied.

  $spud : { 
    'type'=>'tubers', 'name'=>'potatoes', 'color'=>'red', 'size'=>[2,3,5] 
  } 
  $apple : { 
    'type'=>'fruit', 'name'=>'apples', 'color'=>'red', 'size'=>[2,2,2] 
  }
  $orange : {
    'type'=>'fruit', 'name'=>'oranges', 'color'=>'orange', 'size'=>[1,1,1] 
  }
  
  $produce_info : [ $spud, $apple, $orange, ];

Select by DRefs

  matching_keys($produce_info, 'type'=>'tubers') : ( 0 )
  matching_keys($produce_info, 'type'=>'fruit') : ( 1, 2 )
  matching_keys($produce_info, 'type'=>'fruit', 'color'=>'red' ) : ( 1 )
  matching_keys($produce_info, 'type'=>'tubers', 'color'=>'orange' ) : ( )

  matching_values($produce_info, 'type'=>'fruit') : ( $apple, $orange )
  matching_values($produce_info, 'type'=>'fruit', 'color'=>'red' ) : ( $apple )

Index by DRefs

  index_by_drefs($produce_info, 'type') : { 
    'fruit' =>  [ $apple, $orange ],
    'tubers' => [ $spud ],
  }
  
  index_by_drefs($produce_info, 'color', 'type') : {
    'red' => { 
      'fruit' => [ $apple ],
      'tubers' => [ $spud ],
    },
    'orange' => { 
      'fruit' => [ $orange ],
    },
  }

  unique_index_by_drefs($produce_info, 'type') : { 
    'fruit' => $orange,
    'tubers' => $spud,
  }

  ordered_index_by_drefs($produce_info, 'type') : [
    {
      'value' => 'tubers',
      'items' => [ $spud ],
    },
    {
      'value' => 'fruit',
      'items' => [ $orange, $apple ],
    },
  ]

DRefs to Leaf nodes

  leaf_drefs($spud) : ( 'type', 'name', 'color', 'size.0', 'size.1', 'size.2' )

  leaf_values($spud) : ( 'tubers', 'potatoes', 'red', '2', '3', '5' )

  leaf_drefs_and_values($spud) : ( 
    'type' => 'tubers', 'name' => 'potatoes', 'color' => 'red', 
    'size.0' => 2, 'size.1' => 3, 'size.2' => 5
  )

Object Overrides

Here's a get_value_for_key method for an object which provides a calculated timestamp value:

    package Clock;
    
    sub new { bless { @_ }; }
    
    sub m_get_value_for_key {
      my ($self, $key) = @_;
      return time() if ( $key eq 'timestamp' );
      return $self->{ $key };
    }
    
    package main;
    
    set_value_for_root_dref( 'clock', new Clock ( name => "Clock 1" ) );
    ...
    print get_value_for_root_dref('clock.timestamp');

STATUS AND SUPPORT

This release of Data::DRef is intended for public review and feedback. This is the most recent version of code that has been used for several years and thoroughly tested, however, the interface has recently been overhauled and it should be considered "alpha" pending that feedback.

  Name            DSLI  Description
  --------------  ----  ---------------------------------------------
  Data::
  ::DRef          adph  Nested data access using delimited strings

You will also need the String::Escape module from CPAN or www.evoscript.com.

Further information and support for this module is available at <www.evoscript.com>.

Please report bugs or other problems to <bugs@evoscript.com>.

There is one known bug in this version:

  • We don't always properly escape and unescape special characters within DRef strings or protect $Separators embedded within a subkey. This is expected to change soon.

There is one major change under consideration:

  • Perhaps a minimal method-based implementation similar to that used in Data::DRef::MethodBased should be exported to UNIVERSAL, rather than requiring all sorts of unrelated classes to establish a dependancy on this module. Prototype checking might prove to be useful here.

AUTHORS AND COPYRIGHT

Copyright 1996, 1997, 1998, 1999 Evolution Online Systems, Inc. <www.evolution.com>

You may use this software for free under the terms of the Artistic License.

Contributors: M. Simon Cavalletto <simonm@evolution.com>, E. J. Evans <piglet@evolution.com>