Numeric::LL_Array - Perl extension for low level operations over numeric arrays.


  use Numeric::LL_Array;
  blah blah blah


One of the principal overheads of using Perl for numeric vector calculations is the need to constantly create and destroy a large amount of Perl values (there is no such overhead in calculations over scalars, since Perl knows how to reuse implicit temporary variables).

Thus, in this package, we provide a way to manipulate vectors "in place" without creation of new Perl values. Additionally, the calculations over slots in a vector are performed in C, thus significantly reducing overhead of Perl over C (which, for literal translation from C to Perl, is of order 80..200 times).

One of the design goals is that many vectors should be able to use the same C array (e.g., of type double[]) - possibly with offset and a stride, so performing an operation over a subarray does not require a new allocation. The C array is stored in PVX of a Perl variable, so it may be (eventually) refcounted in the usual Perlish way.

Playgrounds and layout of strides

A playground is just (a region in) a Perl string, with the buffer propertly alighed to keep a vector of a certain C type. This way, one gets, e.g., unsigned char-playground, or long double-playground, etc.; we call this C type the flavor of the playground. One describes a certain position in the playground in units of the size of the corresponding C type; e.g., position 3 in double array would typically be at offset of 24 bytes from the start of playground.

A multi-dimensional array of size SIZE1 x SIZE2 x ... x SIZEn may be placed into a playground in many different ways; we allow modification of the start position (the position of the element with index (0,0,...,0)), and of strides. There is one stride per dimension of array; the stride describes how the position of the element changes when one increments the index by 1 in the particular coordinate in the array.

For example, the array 0..4 with start 3 and stride 2 occupies the following positions in the playground:

 content   * * * 0 * 1 * 2 * 3 * 4 ...
 position  0 1 2 3 4 5 6 7 8 9 .......

Note that when plotting this picture one does not care about the flavor of the playground.

Likewise, the 2-dimensional array with contents

  11 12 13 14
  21 22 23 24

and start 1, horizontal stride 2, and vertical stride 3 takes these positions;

  * 11 * 12 21 13 22 14 23 * 24 ....

and with start 12, horizontal stride -1, and vertical stride -5 takes

  *  *  *  * 24 23 22 21  * 14 13 12 11 ...
  0  1  2  3  4  5  6  7  8  9 10 11 12 ...

The major advantage of this flexibility is that one can work with subarrays of the given array, with the transposed array, and with the reflected array without reshuffling the content of the array, but with only changes in the start position and in the strides. For example, to access the transposed 2-dimensional array, one just accesses the same data with 2 strides interchanged. Likewise, by decreasing dimension, one can assess columns and row of the matrix without moving the contents.

Other examples: one can fit N x M x K 0-array into playground of size 1 using strides 0,0,0. And one can fit N x N identity matrix into a playground of size 2N-1 by using strides 1,-1.

For other examples, see "Using extra, "fake" dimensions".

Encoding format of an array

To completely specify an array to the API of this module one needs to provide the C type, the playground of the array, the start offset, its dimension (e.g., 1 for vectors, 2 for matrices etc), and its format, which is the list of the sizes of the array in each particular direction, and the corresponding strides, in the order STRIDE1 COUNT1 STRIDE2 COUNT2 .... To avoid confusion, it makes sense to use the name arity for the number dimension, and use word dimensions for the list COUNT1 COUNT2 ... COUNTn (here n is the "arity").

The format may be encoded as a list reference, or as a packed list of C values of type ptrdiff_t. It is not required that the list contains exaclty 2*dimension elements; it may contain more, and the rest is ignored. This way, one can access rows of matrices without copying the content of the matrix, and without touching the format of the matrix - only by decreasing dimension and modifying the starting position.

Note that the flavor of the array is not a part of the format. For this low-level API, each function is designed to work only with a certain flavor an array (e.g., if operating on 3 arrays, it assumes 3 particular flavors, one for each of 3 arguments); so given a particular function, the required flavor of the argument is fully described by its position in the argument list. (There is no error-checking in this regard, it is the caller's responsibility to follow flavors correctly.) So the flavors are, essentially, encoded in the name of the function.

For example, a function for high-level semantic add_assign($target, $source) (implementing operation $target += $source), can take arguments

  source_playground, target_playground, dim, start_source, start_target,
       format_source, format_target

E.g., for $target of type double, and source of type unsigned short, the name of the function may encode letters "d" and "S" (in analogy to Perl pack formats). (The actual name used is S2d1_add_assign, see "Naming conventions for handlers".)

The semantic of this module is that it is the target dimensions which are assumed to be used; so from the source format only the strides is used, and the COUNTn positions are ignored. Likewise, the arity dim is assumed to be common for arrays, and both formats should contain at least 2*dimension elements.

The accessors

Accessor handlers take the following arguments (with defaults indicated):

 @a = access_T($p, $offset = 0, $dim = 0, $format = undef,
               $in = undef, $keep = FALSE);

extracts a slice of playground $p into Perlish data structure (using references to arrays of references etc. as deep as specified by $dim).

The playground $p, and offset/dim/format arguments have the same sense as for other handlers. If $in is not defined, the "external" layer of the extracted data is put into elements of array @a (so if $dim is 1, elements of @a are numbers; if it is 2, elements are references to arrays of numbers, etc); if $in is TRUE, the returned value is a scalar containing an array reference (so if dim is 1, $a[0] contains a reference to an array of numbers, etc).

If $in is an array reference, then instead of putting the "external" layer of Perlish array into @a, it is put into the referenced array. The fortune of existing elements of the referenced array is governed by $keep; if FALSE, the existing content is removed; if TRUE; the returned data is appended after the end of existing data.

Naming conventions for handlers

There are 4 types of handlers: accessors, and modifying handlers (with 0,1,2 sources). For modifying handlers, the argument which is write-only or read-write is called target, and the read-only arguments are sources.

Perl functions for conversion from Numeric::LL_Array arrays to Perl arrays are called access_T; here T is the letter of pack() specifier corresponding to native C type (e.g., to access native C signed long one uses pack() specifier "l!"; since ! means native, we drop it, and use access_l).

Perl functions which modify one array and take no source are named <T0_type>; here T is a letter encoding the flavor, and type is the identifier describing the semantic of the function (the corresponding C or Perl operator is put below, when it is different):

  negate flip_sign bit_complement incr decr 0   1   2  m1
  !      -         ~              ++   --   0   1   2  -1

  abs cos sin tan acos asin atan exp log log10 sqrt cbrt ceil floor trunc rint

(If C operation takes an argument, we do target = OP(target), otherwise target = OP.) For example, to increment signed char array, one uses the function named c0_incr; to assign -1 to all elements of long double array, use D0_m1. (The operations in the second row (except abs) are implemented only for floating point types.)

Perl functions which use one array ("source") to modify another ("target") are named <S2T1_type>; here T is a letter encoding the target flavor, and S encodes the source flavor. type is the identifier describing the semantic of the function:

  cos sin tan acos asin atan exp

(only for floating point types, and only with target type equal to source type), or assign for assignment (possibly with type conversion), or

  plus_assign   minus_assign   mult_assign   div_assign   remainder_assign
  +=            -=             *=            /=           %=

  lshift_assign rshift_assign  pow_assign   bitand_assign bit(x)or_assign
  <<=           >>=            **=          &=            |=  ^=

  min_assing       max_assign        negate flip_sign bit_complement ne0
  $t = min($t,$s)  $t = max($t,$s)

  abs ceil floor trunc rint log log10 sqrt cbrt

(The ceil floor trunc rint are supported only for floating-point source types; the last two only if your C compiler defines them [most do].) ne0 checks for a value to be non-0. Semantic of function names abs ceil floor trunc rint log log10 sqrt cbrt frexp modf coincides with one of the corresponding C functions.

For example, to convert unsigned long array to a long double array, one uses the function named L2D1_assign.

Likewise, Perl functions which use two arrays ("source1" and "source2") to modify another ("target") are named <sS2T2_type>; here T is a letter encoding the target flavor, s and S encode the source1 and source2 flavors correspondingly. type is the identifier describing the semantic of the function:

  plus minus mult div remainder pow lt gt le ge eq ne lshift rshift
  bitand bitor bitxor min max sproduct

Operations lt gt le ge eq ne work as "in mathematics", not "as in C": a negative number is less than a non-negative, even if one type is signed, another unsigned. (However, currently a comparison of long and double goes through automatic C conversion long -> double. On some machines long has more bits than than mantissa of a double, so this includes rounding a long to the nearest fitting double.)

sproduct performs the operation target += source1 * source2.

The target type must coincide with one of the source types (but see for exceptions in "Extra targets for 2-arg handlers").

If source1 or target are floating point, lshift and rshift operations accept negative arguments; additionally, no truncation is performed (including negative lshift and positive rshift). Likewise for lshift_assign, rshift_assign.

For example, to add signed short array and unsigned int and write the result to a unsigned long array, one uses the function named sI2L2_add. (Read this as signed short and unsigned integer TO unsigned long: add[ition] [with 2 sources].)

In short: the number before underscore is the number of "source" arrays, and the flavors of source arrays preceed the (first) number 2 in the name.

Exceptional handlers: 2 additional handlers are provided (for frexp modf); they take 1 source and 2 targets. They should be called exactly as handlers with 2 sources and 1 target, only the second target takes place of the second source.

Order of operations

When an operation is performed, the cycle is run over the elements of the array; the innermost loop is w.r.t. the first index (i.e., the first of strides and the first of limits), and outermost is w.r.t. the last index.

The start element of the array is processed first, then the order of elements processed is governed by the strides. For example, suppose that $arr with 1 index consists of 0s; let $arr_f be format of $arr but with 1 less element, $off is the offset of the next element of $arr (=stride), and $ones consists of 1s (with format $ones_f), then

  sS2s2_add($arr, $ones, $arr, 0, 0, $offset, $dim, $arr_f, $ones_f, $arr_f);

will initialize $arr so that nth element is equal to n. Indeed, the target is $arr with offset 1; so the first addition will assign 0th element + 1 to the 1st element of $arr; after this the first 2 elements of $arr are "correctly" initialized to 0 and 1. The second addition will assign 1st element + 1 to the 2nd element, so it would become 2, etc.

Extra targets for 2-arg handlers

For the comparison handlers lt gt le ge eq ne, in addition to types of the sources, the target can be of any integer type.

For multiplication handlers mult and sproduct, in addition to types of the sources, the target can be of any wider type. In more details: if one of arguments if floating point, wider means the storage size in bytes is larger than of any of sources. If both arguments are of integer types, wider means one of the following: either in the sense of storage size, or a floating point type, or an unsigned size of the same storage size as the largest of the sources.

For lshift with integer type sources, wider unsigned integer type targets are allowed.

Type casts

Usually the generated C code for handlers contains no C type casts. There are exceptions.

For lshift with integer type sources, and wider integer target, the sources are first cast to the target type.

For mult and sproduct, when a wider type is allowed, the sources are first cast to the target type (with an extra exception of integer type sources and a floating point target which is not of larger bitwidth than sources; then, instead of the target type, the sources are first cast to the next wider integer type - provided such a type exists).


None by default.

All handlers are exportable. So are constants for Perl pack() "type letter" for each flavor of the array:

  packId_format    packId_star_format
  packId_T         packId_star_T

here T is a flavor specifier for a type used by this module.

The functions on the left return the letter, on the right return a letter followed by *. For example, packId_L() would return L! on newer Perls, and a suitable substitute on the older ones. The functions on top return the pack() letter(s) for pack()ing the offsets and strides in the array.

To simplify developing code which works with different numeric types, one can create aliases for types by putting :T=t into import list. After this word, the imported symbols may have T put instead of t. E.g.,

  use Numeric::LL_Array qw( :X=d access_X XX2X2_add packId_X );

(After this, use only the type X to access doubles. If you want to replace doubles by floats, all you need to do is to change one letter: make alias into :X=f.)

Additionally, functions packId($t), packId_star($t) are available; they take type letter (or 'format') as a parameter.

Build methodology

C functions implementing the handlers of this module are collected into four dictionaries: for accessors, and for 0-, 1- and 2-sources modifiers. Each dictionary is compiled from the C code in one minuscule C file, code_*.h; this file is loaded multiple times, once per handler, with a handful of macros describing the operation to perform on each array element, and C types of array elements.

Each dictionary corresponds to one constant wrapper C file, driver_*.c; this wrapper contains no C code, only a few preprocessor directives to include the necessary headers, and the autogenerated loader, driver_*.h. The loaders are generated by a small Perl script,; for each handler, a loader defines necessary macros, and includes the corresponding file code_*.h.

The memory overhead, and the initial slowdown to define thousands of XSUBs wrapping the C handlers would be quite measurable. To avoid this, at start no handler XSUBs are defined. What is defined is four XSUB interfaces; they are "closure XSUBs" (or "interfaces"): to make them into a real, callable, XSUB, one needs to attach to the interface the corresponding dictionary entry. So one extra XSUB is defined which does such an attachment.

So one can define a handler XSUB for by either calling a convenient Perl routine create_handler() (which wrapps low-level attacher XSUB), or by just import()ing the handler (the import()er would call create_handler() as needed) as in

  use Numeric::LL_Array qw( access_i  c2S1_assign );

So the build architecture consists of:

  an import()er which creates handler XSUBs on the fly;

  an attacher which converts interface-XSUBs into handler XSUBs;

  interface-XSUBs which call entries in the dictionary;

  dictionaries created by F<> from code in F<code_*.h>.

So the total complexity of this module is about 200 lines of C code, 450 lines of Perl, and 400 lines of XSUB (including some code for testing and developing).


Avoiding extraneous copying

This module deals with large memory buffers. In Perl, passing arguments to subroutines does not involve copying the memory buffers of string arguments. However, the common construction

  my $arg1 = shift;

does involve copying of memory buffers.

So there are two ways to create Perl subroutines which do not do extraneous copying: first (ugly) way: access arguments as $_[N]. Second way: change the signature of Perl subroutines: they should take playgrounds as references, and dereference them only when passing to handlers.

  sub foo ($) { my $pg = shift; ...  d0_sin $$pg, ...

  foo \$playground;

Using extra, "fake" dimensions

Many "typical" operations over arrays can be simplified a lot by introducing new "fake" dimensions to the array. By the same reason why the size of a vector with the stride 0 does not depend on its dimension, introducing new dimensions with stride 0 does not chenge the memory region occupied by an array. Hence with such dimensions added, still the same data is accessed. The benefit is that many operations may be reduced to just sproduct using this approach.

Consider multiplication of a matrix R by a vector v. If we want to put the result into array w with initial contents 0, one could do

  w[k] += R[k,l] * v[l] for k in 0..K-1, l in 0..L-1;

However, if one could add an extra "fake" second dimension to w, and an extra "fake" first dimension to v, this becomes

  W[k,l] += R[k,l] * V[k,l] for k in 0..K-1, l in 0..L-1;

(this assumes that W[k,l] accesses the same data as w[k], and V[k,l] accesses the same data as v[l]). Now it is an operation of sproduct with target W and sources R and V.

If w is accessed with stride Sw, and dimension K, then W should be accessed with strides Sw, 0, and dimensions K,L. Likewise, if v is accessed with stride Sv, and dimension L, then V should be accessed with strides 0, Sv, and dimensions K,L.

Similarly, of one wants to multiply matrices P, Q of dimensions K,L, and L, M into a resulting matrix R of dimensions K,M (initially 0), one should add extra "fake" dimensions to make them all of arity 3 and dimensions K,L,M, and do sproduct. The new strides are sP1,sP2,0 for P, 0,sQ1,sQ2 for Q, and sR1,0,sR2 for R (here we assume that omitting 0s gives old strides for P,Q,R).

Consider a problem of convolving two arrays A, B of sizes a and b with a > b; the result is an array of size a+b-1 which has RES[k] = SUM A[k-t] B[t] (here k changes between b-1 and a-1, and summation runs over t between 0 and b-1). To conform to API of this module, change indices of the result to 0..a-b.

Then one can calculate this by

  Assign 0 to RES
  Do sproduct() with target RES', and sources A', B'

Here RES', A', B' are arrays RES, A, B with an added "fake" dimension. Assume that strides of A, B, RES are 1; then the strides of A' are 1,-1, of B' are 0,1, and of RES' are 1,0, and the start position of A' is shifted to correspond to A[b-1].

One can immediately modify this to handle multi-dimensional convolution. Moreover, sometimes one can modify this to handle some sparse array configurations. For example, suppose that B is the Laplace kernel

  0  1 0
  1 -4 1
  0  1 0

and A is an array of size l x m. Then the result is of size l-2 x m-2, and can be calculated as:

  Make an array mFOUR of size 1 with content -4.
    Make an array mFOUR' of size l x m with the same buffer, and strides 0,0
  Make an array ONE of size 1 with content 1.
    Make an array ONE' of size l x m x 2 x 2 with the same buffer,
    and strides 0,0,0,0
  RES = A * mFOUR       (pointwise multiplication)
  Do sproduct() with target RES', and sources A', ONE'

Here RES' has strides 1,l,0,0 (we assume that A has strides 1,l), size l-2 x m-2 x 2 x 2, while A' has strides 1,l,-l+1,l+1, and starts at position of index [l,0] of A.

Essentially, we treat the middle 4 of B separately, and note that the remaining 1s form a square. (In fact, a parallelogram instead of a square would be fine too.) When "positioned" on top of A, this square has one side going up-right (which gives a stride -l+1), another down-right (which gives a stride l+1) - these are strides of the "extra" two dimensions of A'.

Note that to calculate contribution of 1s into convolution, one needs to do 4(a-2)(b-2) multiplications, and this is exactly the total size of A'. So the calculation above makes no unneeded multiplications (e.g., 0s in corners of B are completely ignored). (Of course, by breaking the steps into a-2 x b-2 x 2 x 2, we do some extra store/retrieve operations - comparing to doing straightforward convolution. But these operations are done in C, where they are much cheaper than in Perl.)


 NEED: min/max; with signed/vs/unsigned solved ???  min_assign??? argmin() ?
 NEED: How to find first elt which breaks conditions (as in a[n+1] == a[n]+1???
 NEED: more intelligent choice of accessors for q/Q and D (newsvpv?)...
 NEED: accessor to long double max-aligned (to 16 when size is between 8 and 16)
 NEED: long vs double comparison? char-vs-quad comparison? cmp?
 NEED: pseudo-flavor: k-th coordinate of the index (or a linear combination?)
 NEED: All flavors of FFT
 NEED: Indirect access (use value of one array as index in another)
 NEED: use long double C functions if argument is long long (???)
 MAYBE: finish uquad2double()ization of handlers (for broken M$ compilers)

BSD misses many long double APIs (elementary functions, trunc(), rint() shifts, and **). So when deciding whether one wants to do operations over long doubles, one should either check for presence of the needed functions (by doing eval "use Numeric::LL_Array 'D0_sin'; 1" or some such), or check return value of elementary_D_missing().

Some C environments miss trunc() (and maybe rint()? Did not see it missing). In such cases the corresponding handlers are not defined. If you know how to work around, but want to use trunc() if present, one can check for this using eval(), as above, with handlers d0_trunc, d0_rint.

In C Integer Conversion to signed type and floating to integer conversion are implementation-specific unless the (truncated) value can be represented exactly. Hence, e.g., operations with unsigned int and signed int targets need different C implementations (they do not necessarily differ by integer conversion, which is a NOP on in-memory representations). Which means that we need C code to all all the flavors, and our (autogenerated) C code is quite bulky and takes a lot of time to compile.

On some machines (notably "s390 or alpha) double may coincide with long double (at least in size). In such a situation, we assume that these types are completely identical, and do not even try to compile functions with long double arguments. (We use the corresponding functions with double argments instead.)"

Some M$ compilers are broken and do not support conversion of 64bit unsigned integers to double. In such cases, we do not implement handlers mixing Q and floating point types (but transcendental functions applied to Q are supported). For introspection: inspect result of have_uquad2double().


Ilya Zakharevich <>


Copyright (C) 2005 by Ilya Zakharevich <>

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.2 or, at your option, any later version of Perl 5 you may have available.

Possible future

What is needed is hide-the-flavors higher-level wrapper which automatically chooses handlers basing on flavors of arguments. Yet another hiding level would create an overloaded-operation

Possible layout of an object: RV: playground IV: flavor, dim PV: format

Need also a region of playground one can write when targetting this value.

This may give a significant slowdown in the case of smallish arrays. Then maybe a "precompile" step may be useful: another class; operations over this class do not do calculations, but just name/type resolution, and bound checks. While operations are performed, this information is stored in a "program buffer". Then one can invoke the "program buffer" (assumingly inside a cycle, otherwise there is no point in doing preprocessing in advance); this should suit iterative methods...

Tentative example of how it might be accessed:

  my $prog = new Numeric::LL_Program_Array;
  my $playground = ...;
  my $p = new Numeric::Array_for_Program $prog, $playground, 'd', $min, $max,
                                         $start, $dims = [1, 2*$N+1, 1];
  my $one = $p->subarray(0, $dims = [1, $N, 0]); # dim=1, off=0, stride = 0
  my $a   = $p->subarray(1, $dims = [1, $N, 1]); # dim=1, off=1, stride = 1
  my $tmp = $p->subarray($N+1, $dims = [1, $N, 1]); # after $a

  my $one1 = $one->subarray(0, $dims = [1, 1,  0]); # occupies the same place
  $one1->assign_1;              # assigns $one too

  my $a00 = $a->subarray(0, $dims = [1, 1, 1]); # The first element
  my $a0 = $a->subarray(0, $dims = [1, $N-1, 1]); # all but last element
  my $a1 = $a->subarray(1, $dims = [1, $N-1, 1]); # all but first element
  $a1->assign_add($one, $a0);   # Now $a has elements 0..$N-1

  $prog->invoke_and_clean;      # Execute once, forget, start accumulate again


  $prog->invoke for 0..19;      # run 20 iterations of x = x - tan(x) 

(need also flavors of subarray which modify $min/$max?). Note that subarray() does not touch the playground, only the dim/offsets/strides data.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 689:

Unterminated L<...> sequence