The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Data::Dump::Streamer - Stream a highly accurate breadth first data dump in perl code form to a var or file.

SYNOPSIS

  use Data::Dump::Streamer;

  Dump($x,$y);                       # Prints to STDOUT
  Dump($x,$y)->Out();                #   "          "

  my $o=Data::Dump::Streamer->new(); # Returns a new ...
  my $o=Dump();                      # ... uninitialized object.

  my $o=Dump($x,$y);                 # Returns an initialized object
  my $s=Dump($x,$y)->Out();          #  "  a string of the dumped obj
  my @l=Dump($x,$y);                 #  "  a list of code fragments
  my @l=Dump($x,$y)->Out();          #  "  a list of code fragments

  Dump($x,$y)->To(\*STDERR)->Out();  # Prints to STDERR

  Dump($x,$y)->Names('foo','bar')    # Specify Names
             ->Out();

  Dump($x,$y)->Indent(0)->Out();     # No indent

  Dump($x,$y)->To(\*STDERR)          # Output to STDERR
             ->Indent(0)             # ... no indent
             ->Names('foo','bar')    # ... specify Names
             ->Out();                # Print...

  $o->Data($x,$y);                   # OO form of what Dump($x,$y) does.
  $o->Names('Foo','Names');          #  ...
  $o->Out();                         #  ...

DESCRIPTION

Converts a data structure into a sequence of perl statements sufficient for recreating the original via eval. This module is very similar in concept to Data::Dumper and Data::Dump, with the major differences being that this module is designed to output to a stream instead of constructing its output in memory, and that the traversal over the data structure is effectively breadth first versus the depth first traversal done by the others.

In fact the data structure is scanned twice, first in breadth first mode to perform structural analysis, and then in depth first mode to actually produce the output, but obeying the depth relationships of the first pass.

Usage

While Data::Dump::Streamer is at heart an object oriented module, it is expected (based on experience with using Data::Dumper) that the common case will not exploit these features. Nevertheless the method based approach is convenient and accordingly a compromise hybrid approach has been provided via the Dump() subroutine.

All attribute methods are designed to be chained together. This means that when used as set attribute (called with arguments) they return the object they were called against. When used as get attributes (called without arguments) they return the value of the attribute.

From an OO point of view the key methods are the Data() and Out() methods. These correspond to the breadth first and depth first traversal, and need to be called in this order. Some attributes must be set prior to the Data() phase and some need only be set before the Out() phase.

Attributes once set last the lifetime of the object, unless explicitly reset.

Controlling Hash Traversal and Display Order

Data::Dump::Streamer supports a number of ways to control the traversal order of hashes. This functionality is controlled via the SortKeys() and HashKeys() accessor methods. SortKeys() is used to specify the generic ordering of all hashes, and HashKeys() is for specifying the ordering for a specific hashreference, or for all hashes of a given class. SortKeys() takes only a single parameter, and HashKeys() takes a list of pairs. See their documentation for more detail.

By default the traversal of hashes is in 'smart' order, which is something like a dictionary order, and is suitable for mixed numeric and text keys or for either. Two other standard orders are provided, 'alpha'-betical or 'lex'-icographical and 'num'-eric. You may also specify that perls native ordering of the hash be used by specifying false but defined, and have it fallback to a more general ordering rule with undef.

In addition to these preprogrammed orderings you may also provide an ARRAY ref containing a list of keys (and implicitly their order), or a HASH ref used to determine which keys are displayed, and if they are always shown (key=>1) or only shown if they exist (key=>0).

For extremely fine tuning you can provide a CODE ref that will be provided the hash reference being dumped and the pass on which it is being dumped which is expected to return one of the above values. Thus you can say:

  ->SortKeys('lex') # use lex by default
  ->HashKeys('Foo::Bar::Baz'=>sub{
                  my $hash=shift;
                  if ($hash == $special) {return [qw(a b c)]}
                  elsif (UNIVERSAL::isa($hash,'Foo::Bar')) { return 'smart' }
                  elsif (scalar keys %$hash>1000) { return 0 } # force each() use
                  else {return undef} #fallback
                })

And have it all work out as expected. (Well, that is if you really need to apply such a crazy rule :-)

The order in which the rules are applied is:

  1. Object Specific via HashKeys() settings
  2. Class Specifc via HashKeys() settings
  3. Generic via SortKeys() settings
  4. Use perls internal hash ordering.

Controlling Object Representation (Freeze/Thaw)

This module provides hooks for specially handling objects. Freeze/Thaw for generic handling, and FreezeClass/ThawClass for class specific handling. These hooks work as follows (and it should be understood that Freeze() below refers to both it and FreezeClass as does Thaw() refer to ThawClass() as well.

If a Freeze() hook is specified then it is called on the object during the Data() phase prior to traversing the object. The freeze hook may perform whatever duties it needs and change its internal structure, _or_ it may alter $_[0] providing a substitute reference to be dumped instead (note that this will not alter the data structure being dumped). This reference may even be a totally different type!

If a Thaw() hook is specified then as part of the dump code will be included to rebless the reference and then call the hook on the newly created object. If the code was originally frozen (not replaced) the method will be called on the object to unfreeze it during the Out() phase of the dump, leaving the structure unmodified after the dump. If the object was replaced by the freeze hook this doesnt occur as it assumed the data structure has not changed. A special rule applies to Thaw() hooks in that if they include the prefix "->" then they are not executed inline, and as such expected to return the object, but as an independent statement after the object hash been created created, and the return of the statement is ignored. Thus a method that simply changes the internal state of the object but doesn't return an object reference may be used as a Thaw() handler.

For now these options are specified as string values representing the method names. Its possible a later version will extend this to also handle codrefs.

Note that the Freeze/Thaw methods will NOT be executed on objects that don't support those methods. The setting in this case will be silently ignored.

Data::Dumper Compatibility

For drop in compatibility with the Dumper() usage of Data::Dumper, you may request that the Dumper method is exported. It will not be exported by default. In addition the standard Data::Dumper::Dumper() may be exported on request as 'DDumper'. If you provide the tag ':Dumper' then both will be exported.

Dumper
Dumper LIST

A synonym for scalar Dump(LIST)->Out for usage compatibility with Data::Dumper

DDumper
DDumper LIST

A secondary export of the actual Data::Dumper::Dumper subroutine.

Constructors

new

Creates a new Data::Dump::Streamer object. Currently takes no arguments and simply returns the new object with a default style configuration.

See Dump() for a better way to do things.

Dump
Dump VALUES

Smart non method based constructor.

This routine behaves very differently depending on the context it is called in and whether arguments are provided.

If called with no arguments it is exactly equivelent to calling

  Data::Dump::Streamer->new()

which means it returns an object reference.

If called with arguments and in scalar context it is equivelent to calling

  Data::Dump::Streamer->new()->Data(@vals)

except that the actual depth first traversal is delayed until Out() is called. This means that options that must be provided before the Data() phase can be provided after the call to Dump(). Again, it returns a object reference.

If called with arguments and in void or list context it is equivelent to calling

  Data::Dump::Streamer->new()->Data(@vals)->Out()

The reason this is true in list context is to make print Dump(...),"\n"; do the right thing. And also that combined with method chaining options can be added or removed as required quite easily and naturally.

So to put it short:

  my $obj=Dump($x,$y);         # Returns an object
  my $str=Dump($x,$y)->Out();  # Returns a string of the dump.
  my @code=Dump($x,$y);        # Returns a list of the dump.

  Dump($x,$y);                 # prints the dump.
  print Dump($x,$y);           # prints the dump.

Methods

Data
Data LIST

Analyzes a list of variables in breadth first order.

If called with arguments then the internal object state is reset before scanning the list of arguments provided.

If called with no arguments then whatever arguments were provided to Dump() will be scanned.

Returns $self.

Out
Out VALUES

Prints out a set of values to the appropriate location. If provided a list of values then the values are first scanned with Data() and then printed, if called with no values then whatever was scanned last with Data() or Dump() is printed.

If the To() attribute was provided then will dump to whatever object was specified there (any object, including filehandles that accept the print() method), and will always return $self.

If the To() attribute was not provided then will use an internal printing object, returning either a list or scalar or printing to STDOUT in void context.

This routine is virtually always called without arguments as the last method in the method chain.

 Dump->Arguments(1)->Out(@vars);
 $obj->Data(@vars)->Out();
 Dump(@vars)->Out;
 Data::Dump::Streamer->Out(@vars);

All should DWIM.

Names
Names LIST
Names ARRAYREF

Takes a list of strings or a reference to an array of strings to use for var names for the objects dumped. The names may be prefixed by a * indicating the variable is to be dumped as its dereferenced type if it is an array, hash or code ref. Otherwise the star is ignored. Other sigils may be prefixed but they will be silently converted to *'s.

If no names are provided then names are generated automatically based on the type of object being dumped, with abreviations applied to compound class names.

If called with arguments then returns the object itself, otherwise in list context returns the list of names in use, or in scalar context a reference or undef. In void context with no arguments the names are cleared.

NOTE: Must be called before Data() is called.

To
To STREAMER

Specifies the object to print to. Data::Dump::Streamer can stream its output to any object supporting the print method. This is primarily meant for streaming to a filehandle, however any object that supports the method will do.

If a filehandle is specified then it is used until it is explicitly changed, or the object is destroyed.

Declare
Declare BOOL

If Declare is True then each object is dumped with 'my' declarations included, and all rules that follow are obeyed. (Ie, not referencing an undeclared variable). If Declare is False then all objects are expected to be previously defined and references to top level objects can be made at any time.

Defaults to False.

Indent
Indent BOOL

If Indent is True then data is output in an indented and fairly neat fashion, with hash key/value pairs and array values each on their own line.

If indent is False then no indentation is done.

Defaults to True.

Newlines are appended to each statement regardless of this value.

Indentkeys
Indentkeys BOOL

If Indent() and Indentkeys are True then hashes with more than one key value pair are dumped such that the keys and values line up. Note however this means each key has to be quoted twice. Not advised for very large data structures. Additional logic may enhance this feature soon.

Defaults to True.

NOTE: Must be set before Data() is called.

SortKeys
SortKeys TYPE_OR_CODE
Sortkeys
Sortkeys TYPE_OR_CODE

If False then hashes are iterated using each(), and are output in whatever order your particular instance of perl provides, which varies across OS, architecture and version. This requires considerably less memory, and time.

If True then hashes are sorted before dumping. If the value matches /alph|lex/i then a lexicographical sort order is imposed. If the value matches /num/i then a numeric sort order is imposed, and if the value matches /smart/i then a sort order akin to a dictionary sort is imposed. This order is the default and probably will do the right thing for most key sets.

A user may also provide a CODE ref to be used for sorting and prefiltering the hash keys. The hash to be sorted will be passed by reference to the sub, and the sub is expected to return a reference to an array of keys to dump, a string like above, or false for perls ordering. Note that this subroutine will be called twice per hash per dump, with the number of the pass (0 or 1) as the second parameter. The behaviour of returning different values on each pass is not well defined, but it is likely that returning less keys (but the same ordering) on the second pass will be viable. Returning more keys or a different ordering probably wont be.

See "Controlling Hash Traversal and Display Order" for more details.

Note that Sortkeys() is a synonym for SortKeys() for compatibility with expectations formed by Data::Dumper. Data::Dumper provides the former, but the latter is consistant with the method naming scheme in this module. So in the spirit of TIMTOWTDI you can use either. :-)

HashKeys
HashKeys LIST
Hashkeys
Hashkeys LIST

In addition to SortKeys it is possible to further fine tune the traversal and ordering of hashes by using HashKeys(). Using this method you may specify either a specific ordering as in SortKeys, or a coderef similar to that used in SortKeys based on the hashrefs specific identity or its class. The only difference between the returns of the coderefs between the two methods is that if a HashKeys() rule returns undef then a fallback occurs to the SortKeys() rule. However if defined but false is returned then Perls internal ordering is used.

If provided a list it expects either $hash_refernce=>VALUE pairs or 'CLASS::NAME'=>VALUE pairs, and return $self. If called with no parameters in list or scalar context returns the options currently set, and if called with no parameters in void context clears all HashKeys() settings.

See "Controlling Hash Traversal and Display Order" and "SortKeys" for more details.

Note that Hashkeys() is a synonym for HashKeys() for compatibility with expectations formed by Data::Dumper with regard to the method Sortkeys(). See SortKeys for details of this method and the reason behind the synonym.

Verbose
Verbose BOOL

If Verbose is True then when references that cannot be resolved in a single statement are encountered the reference is substituted for a descriptive tag saying what type of forward reference it is, and to what is being referenced. The type is provided through a prefix, "R:" for reference, and "A:" for alias, "V:" for a value and then the name of the var in a string. Automatically generated var names are also reduced to the shortest possible unique abbreviation, with some tricks thrown in for Long::Class::Names::Like::This (which would abbreviate most likely to LCNLT1)

If Verbose if False then a simple placeholder saying 'A' or 'R' is provided. (In most situations perl requires a placeholder, and as such one is always provided, even if technically it could be omitted.)

This setting does not change the followup statements that fix up the structure, and does not result in a loss of accuracy, it just makes it a little harder to read. OTOH, it means dumps can be quite a bit smaller and less noisy.

Defaults to True.

NOTE: Must be set before Data() is called.

DumpGlob
DumpGlob BOOL

If True then globs will be followed and fully defined, otherwise the globs will still be referenced but their current value will not be set.

Defaults to True

NOTE: Must be set before Data() is called.

Deparse
Deparse BOOL

If True then CODE refs will be deparsed use B::Deparse and included in the dump. If it is False the a stub subroutine reference will be output as per the setting of CodeStub().

Caveat Emptor, dumping subroutine references is hardly a secure act, and it is provided here only for convenience.

DeparseOpts
DeparseOpts LIST
DeparseOpts ARRAY

If Deparse is True then these options will be passed to B::Deparse->new() when dumping a CODE ref. If passed a list of scalars the list is used as the arguments. If passed an array reference then this array is assumed to contain a list of arguments. If no arguments are provided returns a an array ref of arguments in scalar context, and a list of arguments in list context.

CodeStub
CodeStub STRING

If Deparse is False then this string will be used in place of CODE references. Its the users responsibility to make sure its compilable and blessable.

Defaults to 'sub { Carp::confess "Dumped code stub!" }'

FormatStub
FormatStub STRING

If Deparse is False then this string will be used in place of FORMAT references. Its the users responsibility to make sure its compilable and blessable.

Defaults to 'do{ local *F; eval "format F =\nFormat Stub\n.\n"; *F{FORMAT} }'

DeparseGlob
DeparseGlob BOOL

If Deparse is True then this style attribute will determine if subroutines and FORMAT's contained in globs that are dumped will be deparsed or not.

Defaults to True.

Rle
Rle BOOL

If True then arrays will be run length encoded using the x operator. What this means is that if an array contains repeated elements then instead of outputting each and every one a list multiplier will be output. This means that considerably less space is taken to dump redundant data.

Freeze
Freeze METHOD

If set to a string then this method will be called on ALL objects before they are dumped. This method may either, change the internal contents of the reference to something suitable for dumping, or may alter $_[0] and have that used _instead_ of the real object reference.

NOTE: Must be set before Data() is called.

Thaw
Thaw METHOD

If set to a string then this method will be called on ALL objects after they are dumped.

NOTE: Must be set before Data() is called.

FreezeClass
FreezeClass CLASS
FreezeClass CLASS, METHOD
FreezeClass LIST

Defines methods to be used to freeze specific classes. These settings override Freeze. If one argument is provided then it returns the method for that class. If two arguments are provided then it sets the dump method for the given class. If more than two arguments are provided then it is assumed it is a list of CLASS, METHOD pairs and sets the entire list, discarding any existing settings. Called with no arguments in void setting clears the overall set of CLASS/METHOD pairs. Called with no arguments in list context returns all CLASS/METHOD pairs. Called with no arguments in scalar content returns a reference to the hash.

NOTE: Must be set before Data() is called.

ThawClass
ThawClass CLASS
ThawClass CLASS, METHOD
ThawClass LIST

Similar to FreezeClass, but called when evaling the data structure back into existance. Has the same calling semantics as FreezeClass.

NOTE: Must be set before Data() is called.

FreezeClass LIST

Defines methods to be used to freeze specific classes. These settings override Freeze. If one argument is provided then it returns the method for that class. If two arguments are provided then it sets the dump method for the given class. If more than two arguments are provided then it is assumed it is a list of CLASS, METHOD pairs and sets the entire list, discarding any existing settings. Called with no arguments in void setting clears the overall set of CLASS/METHOD pairs. Called with no arguments in list context returns all CLASS/METHOD pairs. Called with no arguments in scalar content returns a reference to the hash.

NOTE: Must be set before Data() is called.

FreezeThaw CLASS, FREEZE_METHOD, THAW_METHOD
FreezeThaw LIST

FreezeThaw merges the features of FreezeClass and ThawClass into a single method. It takes a list of triplets and then calls those method as necessary. Purely a bit of syntactitc sugar because I realized the original interface was a bit clunky to use.

FreezeThaw does not currently support 'get' semantics and cannot be used to clear both options. This will probably come in a later release.

NOTE: Must be set before Data() is called.

IgnoreClass
IgnoreClass CLASS
IgnoreClass CLASS, METHOD
IgnoreClass LIST

Similar to FreezeClass, but instead of changing how the object is dumped, causes the object to be outright ignored if is an instance of barred class. The position in the data structure will be filled with a string containing the name of the class ignored. Has the same calling semantics as FreezeClass.

NOTE: Must be set before Data() is called.

Reading the Output

As mentioned in Verbose there is a notation used to make understanding the output easier. However at first glance it can probably be a bit confusing. Take the following example:

    my $x=1;
    my $y=[];
    my $array=sub{\@_ }->( $x,$x,$y );
    push @$array,$y,1;
    unshift @$array,\$array->[-1];
    Dump($array);

Which prints (without the comments of course):

    $ARRAY1 = [
                'R: $ARRAY1->[5]',        # resolved by fix 1
                1,
                'A: $ARRAY1->[1]',        # resolved by fix 2
                [],
                'V: $ARRAY1->[3]',        # resolved by fix 3
                1
              ];
    $ARRAY1->[0] = \$ARRAY1->[5];         # fix 1
    alias_av(@$ARRAY1, 2, $ARRAY1->[1]);  # fix 2
    $ARRAY1->[4] = $ARRAY1->[3];          # fix 3

The first entry, 'R: $ARRAY1->[5]' indicates that this slot in the array holds a reference to the currently undefined $ARRAY1->[5], and as such the value will have to be provided later in what the author calls 'fix' statements. The third entry 'A: $ARRAY1->[1]' indicates that is element of the array is in fact the exact same scalar as exists in $ARRAY1->[1], or is in other words, an alias to that variable. Again, this cannot be expressed in a single statment and so generates another, different, fix statement. The fifth entry 'V: $ARRAY1->[3]' indicates that this slots holds a value (actually a reference value) that is identical to one elsewhere, but is currently undefined. In this case it is because the value it needs is the reference returned by the anonymous array constructer in the fourth element ($ARRAY1->[3]). Again this results in yet another different fix statement. If Verbose() is off then only a 'R' 'A' or 'V' tag is emitted as a marker of some form is necessary.

In a later version I'll try to expand this section with more examples.

A Note About Speed

For smaller size data structures Data::Dumper is far faster than this module. For larger size ones however Data::Dumper may not even be able to complete where Data::Dump:Streamer will. Especially if writing to a filehandle. Tests on the author's machine indicate that a binary tree of 4096 nodes will cause Data::Dumper to exhaust all ram. Data::Dump::Streamer on the other hand scales much further. It worth remembering that what you lose in speed for smaller structures you gain in readability and in accuracy for all of them.

EXPORT

By default exports the Dump() command. Or may export on request the same command as Stream(). A Data::Dumper::Dumper compatibility routine is provided via requesting Dumper and access to the real Data::Dumper::Dumper routine is provided via DDumper. The later two are exported together with the :Dumper tag.

Additionally there are a set of internally used routines that are exposed. These are mostly direct copies of routines from Array::RefElem, Lexical::Alias and Scalar::Util, however some where marked have had their semantics slightly changed, returning defined but false instead of undef for negative checks, or throwing errors on failure.

The following XS subs (and tagnames for various groupings) are exportable on request.

  :Dumper
        Dumper
        DDumper

  :undump          # Collection of routines needed to undump something
        alias_av
        alias_hv
        alias_ref
        make_ro

  :alias           # all croak on failure
     alias_av(@Array,$index,$var);
     alias_hv(%hash,$key,$var);
     alias_ref(\$var1,\$var2);
     push_alias(@array,$var);

  :util
     blessed($var)           #undef or a class name.
     reftype($var)           #the underlying type or false but defined.
     refaddr($var)           #a references address
     refcount($var)          #the number of times a reference is referenced
     sv_refcount($var)       #the number of times a scalar is referenced.
     looks_like_number($var) #if perl will think this is a number.

     regex($var)     # In list context returns the pattern and the modifiers,
                     # in scalar context returns the pattern in (?msix:) form.
                     # If not a regex returns false.
     readonly($var)  # returns whether the $var is readonly
     make_ro($var)   # causes $var to become readonly
     reftype_or_glob # returns the reftype of a reference, or if its not
                     # a reference but a glob then the globs name
     refaddr_or_glob # similar to reftype_or_glob but returns an address
                     # in the case of a reference.
     globname        # returns an evalable string to represent a glob, or
                     # the empty string if not a glob.
  :all               # (Dump() and Stream() and Dumper() and DDumper()
                     #  and all of the XS)
  :bin               # (not Dump() but all of the rest of the XS)

By default exports only the Dump() subroutine. Tags are provided for exporting 'all' subroutines, as well as 'bin' (not Dump()), 'util' (only introspection utilities) and 'alias' for the aliasing utilities. If you need to ensure that you can eval the results (undump) then use the 'undump' tag.

BUGS

Code with this many debug statements is certain to have errors. :-)

Please report them with as much of the error output as possible.

Be aware that to a certain extent this module is subject to whimsies of your local perl. The same code may not produce the same dump on two different installs and versions. Luckily these dont seem to pop up often.

AUTHOR AND COPYRIGHT

Yves Orton, <demerphq at hotmail dot com>

Copyright (C) 2003 Yves Orton

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

Contains code derived from works by Gisle Aas, Graham Barr, Jeff Pinyan, Richard Clamp, and Gurusamy Sarathy.

Thanks to Dan Brook (broquaint) for testing and moral support. Without his encouragement the 1.0 release would never have been written.

Thanks to Yitzchak Scott-Thoennes for the format dumping code.

SEE ALSO

perl. Perlmonks

2 POD Errors

The following errors were encountered while parsing the POD:

Around line 281:

=back doesn't take any parameters, but you said =back 4

Around line 711:

=back doesn't take any parameters, but you said =back 4