The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

PDL::PP - Generate PDL routines from concise descriptions

SYNOPSIS

        use PDL::PP qw/Modulename Packagename Prefix/;

        addhdr('#include "hdr.h"');

        addpm('sub foo {}');

        defpdl(
                'Transpose',
                'a(x,y,X); int [o]b(y,x,X)',
                'int c',
                'loop(x,y) %{
                        $b() = $a();
                %}'
        );

        done();

DESCRIPTION

This module defines the routine defpdl that generates xsub code from a short description such as the transpose function above. done automatically writes the files Prefix.xs and Prefix.pm.

The idea is that since this concise description encodes in itself (better than C code, which would be difficult to interpret) what is necessary to do, this code can be compiled to C in many different ways. Also, the resulting C code can be easily made to do the right thing in many situations: for example, in the above code, the matrix b is a destination matrix so the code can check whether b exists and has the right dimensions or die or alternatively create a new b in that case.

Of course, a human can also code all the intelligent code, but if there are tens of different routines, it gets very dull after a while. And to think about reuse: in the above code, the line

        b() = a();

is interpreted by the routine. At some hypothetical future time, if PDL starts supporting sparse matrices, this might still be made to work. Also, this code could be used in a wildly different environment from PDL, achieving a kind of universality. Alternatively, the compiler could, for debugging, place bounds checking at each access to a and b (because they are stored in memory sequentially, this would be far superior to the usual gcc bounds checking).

PDL variables

The second argument to defpdl is either a ref to an array of strings of the form

        typeoption [options]name(indices,X)

or a concatenation of strings like this with semicolons between them. Options is a comma-separated list which can at the moment contain

o

This pdl is used only for output and is therefore liable to be necessary to create at runtime. In this case, all of its indices need to have a defined value.

int

This pdl is of type integer and is not to be coerced to the same type as everything else.

The name is a lowercase alphanumeric name for the variable. One of the names can be preceded by ">" which means that is the function is called like $a = f($b) instead of f($a,$b) then this argument is the output. The indices part is a comma-separated list of lowercase index names or "..." or an uppercase index name for a "rest" index.

Indices

defpdl uses named indices. In the first example, there were two named indices, x and y and a "rest" index, X. Each index name is unique so the x in both the definitions of a and b are interpreted to mean the same number of elements and a runtime check is made of this.

The "rest" index is a special case which may contain several indices, and must be currently in the same order. The idea is that the code will be automatically looped over this set of indices. In the future, it may be possible to have several different "rest" indices for different sets of variables.

Loops

In the C code, it is possible to automatically create loops. In the example, the line

        loop(x,y) 

Makes loops over the indices x and y. If all your dimensions mean different things, then this is usually sufficient but if you have some square matrices, for example correlation or so, you need to use the syntax

        loop(x0,x1)

which starts two loops over the same size. Currently, to make it easier to program, the loops use the sequences %{ and %} (like yacc) to start and end. In the future, this may change.

As a point of interest, there is an actual parser and context manager with stack and all in the code. Perl makes these things very easy to do.

Array access

defpdl attempts to make the defaults do the right thing in a wide variety of cases without the need to specify the indices explicitly. However, special cases always arise and for those, the syntax

        loop(x1,x2) %{ a() = b(x => x1) * c(x => x2) %};

may be used (here the sizes could be [qw/[o]a(x,x) b(x) c(x)/], in which case this sets a to the outer product of b and c.

Naming

For user access, there are some standard naming conventions. All loop variables have just the name inside the loop declaration. Index sizes have the name of the index followed by _size. The same name is used if it is necessary to specify the dimension of an output variable as a parameter.

INFLUENCES

The ideas here have been influenced by the language Yorick as well as matlab and scilab.

BUGS

Uncountably.

When using GCC, it would be much faster to just declare an array with variable number of indices than to use pdl_malloc. With other compilers, it would also be a lot faster to use a huge largest N_DIM (16, for example, or if you want to be *ABSOLUTELY* certain, 50) and be done with it. Then it will be on the stack, and allocated and accessed rapidly.

At the moment, the code does not create nonexistent or invalid-sized pdls. However, the change is fairly trivial.

The run-time error messages the code generates are really awful and uninformative.

An important issue is whether this version puts C too far from us. It is possible to use normal C loops instead of the loop() syntax and so on, but I think it may come in handy pretty often.

The code is not very readable at the moment. It is fairly modular, however.

The generated code is relatively inefficient, especially at access times. The outer loops should update pointers to the data accessed inside to be efficient. However, the comfort of writing code like this is very nice.

The current type coercion is not good.

AUTHOR

Copyright (C) 1995 Tuomas J. Lukka (Tuomas.Lukka@helsinki.fi)