The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

PDL::PP - Generate PDL routines from concise descriptions

SYNOPSIS

        use PDL::PP qw/Modulename Packagename Prefix/;

        addhdr('#include "hdr.h"');

        addpm('sub foo {}');

        defpdl(
                'inner_product',
                'a(x); b(x); c(); TYPES:BSULFD;',
                'double tmp = 0;
                 loop(x) %{
                        tmp += $a() * $b();
                        $TFD(do_when_float, do_when_double);
                %}
                 $c() = tmp;',
                 PP::Add_Call_Parameters('int foo'),
        );

        done();

DESCRIPTION

This module defines the routine defpdl that generates xsub code from a short description such as the inner_product function above. done() automatically writes the files Prefix.xs and Prefix.pm.

The idea is that since this concise description encodes in itself (better than C code, which would be difficult to interpret) what is necessary to do, this code can be compiled to C in many different ways. Also, the resulting C code can be easily made to do the right thing in many situations: for example, in the above code, the matrix b is a destination matrix so the code can check whether b exists and has the right dimensions or die or alternatively create a new b in that case.

Of course, a human can also code all the intelligent code, but if there are tens of different routines, it gets very dull after a while. And to think about reuse: in the above code, the line

        tmp += $a() * $b();

is interpreted by the routine. At some hypothetical future time, if PDL starts supporting sparse matrices, this might still be made to work. Also, this code could be used in a wildly different environment from PDL, achieving a kind of universality. Alternatively, the compiler could, for debugging, place bounds checking at each access to a and b (because they are stored in memory sequentially, this would be far superior to the usual gcc bounds checking).

Indexing

PDL::PP makes it possible for the caller to use a powerful indexing system. Each index has an associated increment in the flat data space so it becomes trivial to exchange two indices or so on.

Threading

A single call to a PDL::PP-defined xsub can be a loop over several dimensions. For example, if a function expects a 2D argument and is called with a pdl with dimensions (5,4,3) the call is automatically translated into

        func($a(:,:,0)); func($a(:,:,1)); func($a(:,:,2)); 

except that it is faster. This is called implicit threading.

If, in the previous example, you had wanted to thread over the second dimension instead of the last (which is the default for implicit threading), you need to do either

        func($a->xchg(1,2));

Which exchanges dimensions 1 and 2, resulting in (5,3,4) or

        func($a->thread(1));

Which means that we want to thread over dimension 1 (the second dimension, of width 4).

For functions of several variables, this gets slightly more involved. Implicit threading takes the extra dimensions on all the variables and tuples them up so that the first extra dimension on each variable is one index to be threaded over (so all the first extra dimensions must be of the same size) and so on for the second extra dimension and so on.

An extra dimension of size 1 is interpreted to have the same size as any other dimension but with increment 0 (all values of the index point to the same data).

For example, if func now expects to have data as

        f(a(x,y),b(x,y,z),c(x));

then we can call (now, the numbers in parentheses are dimension sizes):

        f(a(5,3,10,11),b(5,3,2,10,1,12),c(5,1,11,12));

The meaning of which should be clear from the numbers.

Explicit threading, in the case of several variables is different: here it is demanded that each thread have the same number of arguments, the first arguments on each call corresponding to each other, similarly for the second etc.

PDL variables

The second argument to defpdl is either a ref to an array of strings of the form

        typeoption [options]name(indices)

or a concatenation of strings like this with semicolons between them. Options is a comma-separated list which can at the moment contain

o

This pdl is used only for output and is therefore liable to be necessary to create at runtime. In this case, all of its indices need to have a defined value. If threading is done when this pdl is created, then this it will have all the threadindices threaded over in it.

t

This pdl is used only as a temporary and is therefore liable to be necessary to create at runtime. In this case, all of its indices need to have a defined value. On creation, no threadindices will be included to a temp.

int

This pdl is of type integer and is not to be coerced to the same type as everything else.

The name is a lowercase alphanumeric name for the variable. One of the names can be preceded by ">" which means that is the function is called like $a = f($b) instead of f($a,$b) then this argument is the output (THIS IS NOT YET IMPLEMENTED). The indices part is a comma-separated list of lowercase index names. If necessary, the indices can have predetermined sizes with the syntax

        a(x,y=2);

One of the strings in the second argument can be of the type

        TYPES:whichtypes

where whichtypes consists of a combination of the letters BSULFD meaning for which datatypes this function should be build for. The default is all but if you are interfacing to a function library, for instance, the function may only be defined for some data types.

Indices

defpdl uses named indices. In the first example, there were two named indices, x and y and a "rest" index, X. Each index name is unique so the x in both the definitions of a and b are interpreted to mean the same number of elements and a runtime check is made of this.

Loops

In the C code, it is possible to automatically create loops. In the example, the line

        loop(x,y) 

Makes loops over the indices x and y. If all your dimensions mean different things, then this is usually sufficient but if you have some square matrices, for example correlation or so, you need to use the syntax

        loop(x0,x1)

which starts two loops over the same size. Currently, to make it easier to program, the loops use the sequences %{ and %} (like yacc) to start and end. In the future, this may change.

As a point of interest, there is an actual parser and context manager with stack and all in the code. Perl makes these things very easy to do.

Array access

defpdl attempts to make the defaults do the right thing in a wide variety of cases without the need to specify the indices explicitly. However, special cases always arise and for those, the syntax

        loop(x1,x2) %{ a() = b(x => x1) * c(x => x2) %};

may be used (here the sizes could be [qw/[o]a(x,x) b(x) c(x)/], in which case this sets a to the outer product of b and c.

There is also the special case pointer access $P(x) which gives you a pointer to the first element of x in the current rest dimension indexes. Note that if you are using this and someone starts mapping indices, you are in big trouble. This is for use only when you are passing arguments to outside functions.

Naming

For user access, there are some standard naming conventions. All loop variables have just the name inside the loop declaration. Index sizes have the name of the index followed by _size. The same name is used if it is necessary to specify the dimension of an output variable as a parameter.

Type coercion

Usually, you don't have to worry about type coercion. What happens is that PP converts all pdls to which you have not specified a type to the highest common denominator.

Currently, it is only possible to specify pdls to be integer or the generic type.

Often, when interfacing with fortran, there are two different versions of a subroutine for singles and doubles which are called relatively similarly. $T[typechars](typeactions) provides a way to do this: typechars is a sequence of the characters BSULFD and typeactions a comma-separated list.

Customization

Because this kind of code generation will be fairly widely used, it is necessary to be able to customize many steps of the process. For example, we really want to be able to do any kinds of strange things. The current solution is to add "hooks" into many places of the code generation. This is not the way I really would like to do it, this is not object-oriented but it works for now.

And it is possible to retain almost the same user interface later.

After the C code, there can be any number of Hook objects, for example in the above SYNOPSIS, the row

         PP::Add_Call_Parameters->new('int foo'),

creates a hook object that defpdl will call when it is making up the parameter list for the xsub. The hook has the possibility to modify the parameter list in any way it chooses but this particular hook only adds a new parameter "int foo".

In the most general form, to be implemented later, the whole PDL::PP defpdl routine simply calls a predefined sequence of hooks and the user has the possibility to insert his/her own hooks in any place. This would be a kind of a "blackboard" approach, allowing interesting pre- and postprocessing at any phase of the operation.

One important application of hooks is when it is necessary to do some time or space-consuming operations outside the implicit and datatype loops, for example the implementation of binary ops for PDL will work this way.

INFLUENCES

The ideas here have been influenced by the language Yorick as well as matlab and scilab.

BUGS

Uncountably.

When using GCC, it would be much faster to just declare an array with variable number of indices than to use pdl_malloc. With other compilers, it would also be a lot faster to use a huge largest N_DIM (16, for example, or if you want to be *ABSOLUTELY* certain, 50) and be done with it. Then it will be on the stack, and allocated and accessed rapidly.

The run-time error messages the code generates are really awful and uninformative.

An important issue is whether this version puts C too far from us. It is possible to use normal C loops instead of the loop() syntax and so on, but I think it may come in handy pretty often.

The code is not very readable at the moment. It is fairly modular, however.

The generated code is relatively inefficient, especially at access times. The outer loops should update pointers to the data accessed inside to be efficient. However, the comfort of writing code like this is very nice.

The documentation is written by the author :(

AUTHOR

Copyright (C) 1996 Tuomas J. Lukka (lukka@fas.harvard.edu)