The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

TITLE

C Structure Class

STATUS

Proposal.

AUTHOR

Leopold Toetsch

ABSTRACT

The ParrotClass PMC is the default implementation (and the meta class) of parrot's HLL classes. It provides attribute access and (TODO) introspection of attribute names. It is also handling method dispatch and inheritance.

C structures used all over in parrot (PMCs) and user-visible C structures provided by the {Un,}ManagedStruct PMC dont't have this flexibility.

The proposed CStruct PMC is trying to bridge this gap.

DESCRIPTION

The CStruct PMC is the class PMC of classes, which are not based on PMC-only attributes but on the general case of a C structure. That is, the CStruct is actually the parent class of ParrotClass, which is a PMC-only special case. And it is the theoretical ancestor class of all PMCs (including itself :).

The relationship of CStruct to other PMCs is like this:

                PASM/PIR code         C code
  Class         ParrotClass           CStruct
  Object        ParrotObject          *ManagedStruct
                                      (other PMCs) 

That is, it is the missing piece of already existing PMCs. The current *ManagedStruct PMCs are providing the class and object functionality in one and the same PMC (as BTW all other existing PMCs are doing). But this totally prevents proper inheritance and reusability of such PMCs.

The CStruct class provides the necessary abstract backings to get rid of current limitations.

SYNTAX BITS

Constructing a CStruct

A typical C structure:

  struct foo {
    int a;
    char b;
  };

could be created in PIR with:

  cs = subclass 'CStruct', 'foo'   # or maybe  cs = new_c_class 'foo'
  addattribute cs, 'a'
  addattribute cs, 'b'

The semantics of a C struture are the same as of a Parrot Class. But we need the types of the attributes too:

Handwavingly TBD 1)

with ad-hoc existing syntax:

  .include "datatypes.pasm"
  cs['a'] = .DATATYPE_INT
  cs['b'] = .DATATYPE_CHAR

Handwavingly TBD 2)

with new variants of the addattribute opcode:

  addattribute cs, 'a', .DATATYPE_INT
  addattribute cs, 'b', .DATATYPE_CHAR

Probably desired and with not much effort TBD 3):

  addattribute(s) cs, <<'DEF'
    int a;
    char b;
  DEF   

The possible plural in the opcode name would match semantics, but it is not necessary. The syntax is just using Parrot's here documents to define all the attributes and types.

  addattribute(s) cs, <<'DEF'
    int "a";
    char "b";
  DEF   

The generalization of quoted attribute names would of course be possible too, but isn't likely needed.

Syntax variant

  cs = subclass 'CStruct', <<'DEF
    struct foo {
      int a;
      char b;
    };
  DEF

I.e. create all in one big step.

Object creation and attribute usage

This is straight forward and conforming to current ParrotObjects:

  o = new 'foo'                 # a ManagedStruct instance
  setattribute o, 'a', 4711
  setattribute o, 'b', 22
  ...

The only needed extension would be {get,set}attribute variants with natural types.

Even (with nice to have IMCC syntax sugar):

  o.a = 4711    # setattribute
  o.b = 22
  $I0 = o.a     # getattribute

Nested Structures

  foo_cs = subclass 'CStruct', 'foo'
  addattribute(s) foo_cs, <<'DEF'
    int a;
    char b;
  DEF   
  bar_cs = subclass 'CStruct', 'bar'
  addattribute(s) bar_cs, <<'DEF'
    double x;
    foo cfoo;              # contained foo structure
    foo *fptr;             # a pointer to a foo struct
  DEF   
  o = new 'bar'
  setattribute o, 'x', 3.14                   # C-ish equivalent:
  setattribute o, ['cfoo'; 'a'], 4711         # o.foo.a = 4711
  setattribute o, ['fptr'; 'b'], 255          # o.fptr->b = 255

Attribute access is similar to current *ManagedStruct's hash syntax but with a syntax matching ParrotObjects.

Array Structures Elements

  foo_cs = subclass 'CStruct', 'foo'
  addattribute(s) foo_cs, <<'DEF'
    int a;
    char b[100];
  DEF   

Access to array elements automatically does bounds checking.

Possible future extemsios

  cs = subclass 'CStruct', 'todo'
  addattribute(s) foo_cs, <<'DEF'
    union {              # union keyword
      int a;
      double b;
    } u;  
    char b[100]  :ro;    # attributes like r/o     
  DEF   

Managed vs. Unmanaged Structs

The term "managed" in current structure usage defines the owner of the structure memory. ManagedStruct means that parrot is the owner of the memory and that GC will eventually free the structure memory. This is typically used when C structures are created in parrot and passed into external C code.

UnManagedStruct means that there's some external owner of the structure memory. Such structures are typically return results of external code.

E.g.:

  $P0 = some_c_func()          # UnManagedStruct result
  assign $P0, foo_cs           # assign a structure class to it 

  o = new 'foo_cs'             # ManagedStruct instance
  setattribute o, 'a', 100
  setattribute o, ['b'; 99], 255  # set last elem

RATIONAL

Parrot as the planned interpreter glue language should have access to all possible C libraries and structures. It has to abstract the low-level bindings in a HLL independant way and should still be able to communicate all information "upstairs" to the HLL users.

But it's not HLL usage only, parrot itself is already suffering from lack of abstraction at PMC level.

Inheritance

I've implemented an OO-ified HTTP server named httpd2.pir. The HTTP::Connection class ought to be a subclass of ParrotIO (we don't have a base socket class, but ParrotIO would do it for now). This kind of inheritance isn't possible. The implementation is now a connection hasa ParrotIO, instead of isa. It's of course losing all inheritance with that which leads to delegation code and work arounds.

The same workarounds are all over SDL/* classes. There are layout helpers and raw structure accessores and what not. Please read the code. It's really not a problem of the implementation (which is totally fine) it's just the lack of usability of parrot (when it comes to native structures (or PMCs)).

All these experiments to use a C structures or a PMC as base class are ending with a has relationship instead of the natural isa. Any useful OO-ish abstraction is lost and is leading to clumsy code, and - no - implementing interfaces/traits/mixins can't help here, as these are all based on the abstraction, which is described here.

Inheritance and attribute access

This proposal alone doesn't solve all inheritance problems. It is also needed that the memory layout of PMCs and ParrotObjects deriving from PMCs is the same. E.g.

  cl = subclass 'Integer', 'MyInt'

The int_val attribute of the core Integer type is located in the cache union of the PMC. The integer item in the subclass is hanging off the data array of attributes and worse it is a PMC too, not a natural int. This not only causes additional indirections (see deleg_pmc.pmc) but also negatively impacts Integer PMCs, as all access to the int_val has to be indirected through get_integer() or set_integer_native() to be able to deal with subclassed integers.

Again the implementation of above is: MyInt hasa Integer, instead of the desired isa int_val.

With the abstraction of a CStruct describing the Integer PMC and with differently sized PMCs, we can create an object layout, where the int_val attribute of Integer and MyInt are at the same location and of the same type.

Given this (internal) definition of the Integer PMC:

  intpmc_cl = subclass 'CStruct', 'Integer'
  addattribute(s) intpmc_cl, <<'DEF'
    INTVAL int_val;            # PMC internals are hidden
  DEF   

we can transparently subclass it as MyInt, as all the needed information is present in the CStruct intpmc_cl class.

Introspection, PMCs and more

  cc = subclass 'CStruct', 'Complex'
  addattribute(s) cc, <<'DEF'
    FLOATVAL re; 
    FLOATVAL im; 
  DEF   

This is the (hypothetical) description of a Complex PMC class. An equivalent syntax can be translated by the PMC compiler to achieve the same result.

This definition of the attributes of that PMC provides automagically access to all the information stored in the PMC. All such access is currently hand-crafted in the complex.pmc. Not only that this accessor code could be abandoned (and unified with common syntax), all possible classes inheriting from that PMC could use this information.

Implementation

CStruct is basically yet another PMC and can be implemented and put to functionality without any interference with existing code. It is also orthogonal with possible PMC layout changes.

The internals of CStruct can vastly reuse code from src/objects.c to deal with inheritance or object instantiation. The main difference is that attributes have additionally a type attached to it and consequently that the attribute offsets are calculated differently depending on type, alignment, and padding. These calculations are already done in unmanagedstruct.pmc.

CStruct classes can be attached to existing PMCs gradually (and by far not all PMCs need that abstract backing). But think e.g. of the Sub PMC. Attaching a CStruct to it, would instantly give access to all it's attributes and vastly simplify introspection.

Only the final step ("Inheritance and attribute access") needs all parts to play together.

All together now

Differently sized PMCs

Provide the flexible PMC layout.

CStruct classes

Are describing the structure of PMCs (or any C structure).

R/O vtables

Prohibit modification of readonly PMCs like the Sub PMC. These are already coded within the STM project.

SEE ALSO

pddXX_pmc.pod (proposal for a flexible PMC layout)