The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

PDL::Internals - description of the current internals

DESCRIPTION

Intro

This document explains various aspects of the current implementation of PDL. If you just want to use PDL for something, you definitely do not need to read this. Even if you want to interface your C routines to PDL or create new PDL::PP functions, you do not need to read this (though it may be informative). This document is primarily intended for people interested in debugging or changing the internals of PDL. To read this, a good understanding of the C language and programming and data structures in general is required, as well as some perl understanding. If you read through this document and understand all of it and are able to point what any part of this document refers to in the PDL core sources and additionally struggle to understand PDL::PP, you will be awarded the title "PDL Guru" (of course, the current version of this document is so incomplete that this is not yet the case).

Warning: If it seems that this document has gotten out of date, please inform the PerlDL developers email list (address in the README file) about it. This may well happen.

Piddles

Currently, a pdl data object is a hash ref which contains the element PDL, which is a pointer to a pdl structure, as well as some other fields. The file Core.pm uses some of these fields and the file pdlhash.c converts these to C when necessary.

The pdl struct is defined in pdl.h and the meanings of the fields are

magicno

A magic number, used to check whether something really is a piddle when debugging.

state

Various flags about the state of the pdl, such as whether the parents of this pdl have been altered at some point.

trans

Where this pdl was obtained from. This pointer may be null, in which case this pdl is not getting any dataflow from anywhere. Note, however that being non-null does not mean that data is flowing:

 $a = pdl 2,3,4; 
 $b = pdl 4,5,6;
 $c = $a + $b;     # Note: no dataflow (not asked for)

here, the trans field in $c contains a pointer to a transformation. Only when $a or $b is changed, is the transformation destroyed and the field cleared. To see whether data is flowing, check the flags field of the trans struct.

vafftrans

This is intended for speeding up e.g. the chaining of affine transformations. See pdlapi.c for the code handling this. Also, slices.pd defines some things with / for this.

sv

Pointer to the hash object. May be null if this pdl does not have a perl counterpart.

datasv, data

The field datasv is a pointer to the perl SV containing the data string. These may be null before the pdl is finally physicalized.

nvals

How many values there are in data

datatype

The type of the data stored in the data vector.

dims, ndims

The dimensions of this pdl. Remember to physicalize the pdl before using.

dimincs

As an optimization, an increment for each dimension is stored here. These are required to correspond exactly to dims. If you want to optimize for affine transformations, use the trans or vtrans.

threadids, nthreadids

This is where the threading tags are stored. The way this works is that ndims and dims hold all dimensions of the pdl, including threaded dimensions. The real dimensions of the pdl extend from 0 to threadids[0]-1, the thread dimensions with id 0 extend from threadids[0] to threadids[1]-1 and the thread dimensions with the last id extend from threadids[nthreadids-1] to threadids[nthreadids]-1. For example, if a pdl has dimensions (2,3,4,5) (= 120 elements) and nthreadids==2 and threadids={1,3,4}, there is one "real" dimensions with size 2, two dimensions with threadid 0 (3 and 4) and the dimensions with size 5 has threadid 1.

progenitor, future_me

See the section on families below

children

The children of this pdl i.e. where data is flowing to from this pdl.

living_for

XXX Not quite clear right now. Has to do with families

def_*

To avoid mallocs, there is a suitable amount of space already allocated for each pointer in this pdl, with the ideology that if you have more than six-dimensional data you must be willing to settle for a little more overhead.

magic

If this pdl is magical (e.g. if it is bound to something), this pointer is non-null and you must call the appropriate magic-handling routines when using the pdl.

hdrsv

A ``header'' SV * that can be set and accessed from outside. Can be used to include any perl object in a piddle.

Transformations

Each transformation has a virtual table which contains various information about that transformation. Usually transformations are generated with PDL::PP so it's better to see that documentation.

Freeing

Currently, not much is freed, especially when dataflow is done. This is bound to change pretty soon.

Threading

The file pdlthread.c handles most of the threading matters. The threading is encapsulated in the structure pdlthread.h.

Accessing children and parents of a piddle

The file Basic/Core/pdlapi.h.PL contains useful routines for manipulating the pdl structure (it's probably easier to read Basic/Core/pdlapi.h once you've performed a build of PDL).

An example of processing the children of a piddle is provided by the baddata method of PDL::Bad (only available if you have comiled PDL with the WITH_BADVAL option set to 1, but still useful as an example!).

Consider the following situation:

 perldl> $a = rvals(7,7,Centre=>[3,4]);
 perldl> $b = $a->slice('2:4,3:5');
 perldl> ? vars
 PDL variables in package main::

 Name         Type   Dimension       Flow  State          Mem
 ----------------------------------------------------------------
 $a           Double D [7,7]                P            0.38Kb 
 $b           Double D [3,3]                VC           0.00Kb 

Now, if I suddenly decide that $a should be flagged as possibly containing bad values, using

 perldl> $a->baddata(1)

then I want the state of $b - it's child - to be changed as well, so that I get a 'B' in the State field:

 perldl> ? vars                    
 PDL variables in package main::

 Name         Type   Dimension       Flow  State          Mem
 ----------------------------------------------------------------
 $a           Double D [7,7]                PB           0.38Kb 
 $b           Double D [3,3]                VCB          0.00Kb 

This bit of magic is performed by the propogate_badflag function, which is listed below:

 /* newval = 1 means set flag, 0 means clear it */
 /* thanks to Christian Soeller for this */

 void propogate_badflag( pdl *it, int newval ) {
    PDL_DECL_CHILDLOOP(it)
    PDL_START_CHILDLOOP(it)
    {
        pdl_trans *trans = PDL_CHILDLOOP_THISCHILD(it);
        int i;
        for( i = trans->vtable->nparents;
             i < trans->vtable->npdls;
             i++ ) {
            pdl *child = trans->pdls[i];

            if ( newval ) child->state |=  PDL_BADVAL;
            else          child->state &= ~PDL_BADVAL;

            /* make sure we propogate to grandchildren, etc */
            propogate_badflag( child, newval );

        } /* for: i */
    }
    PDL_END_CHILDLOOP(it)
 } /* propogate_badflag */

Given a piddle (pdl *it), the routine loops through each pdl_trans structure, where access to this structure is provided by the PDL_CHILDLOOP_THISCHILD macro. The children of the piddle are stored in the pdls array, after the parents, hence the loop from i = ...nparents to i = ...nparents - 1. Once we have the pointer to the child piddle, we can do what we want to it; here we change the value of the state variable, but the details are unimportant). What is important is that we call propogate_badflag on this piddle, to ensure we loop through its children. This recursion ensures we get to all the offspring of a particular piddle.

THE FOLLOWING NEEDS TO BE CHECKED.

Access to parents is similar, with the for loop replaced by:

        for( i = 0;
             i < trans->vtable->nparents;
             i++ ) {

AUTHOR

Copyright(C) 1997 Tuomas J. Lukka (lukka@fas.harvard.edu), 2000 Doug Burke (burke@ifa.hawaii.edu).

Redistribution in the same form is allowed but reprinting requires a permission from the author.