The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Base - SMOP basic structures

REVISION

$Id$

SMOP__Object

In SMOP, every single value must be binary-compatible with the SMOP__Object struct. This even includes core level constructs such as the interpreter and the native types. This idea comes directly from how perl5 works, with the SV struct.

Unlike p5, however, the SMOP__Object struct is absolutely minimalist; It defines no type, no flags, and no introspection information. It defines only that every SMOP__Object has a "responder interface" (.RI), so the structure is merely:

struct SMOP__Object {
  SMOP__ResponderInterface* RI;
  /* Maybe there is something here, maybe there is nothing here.
   * Only the responder interface knows.
   */
}

The value in the .RI member is not unique to the object. For all but singleton classes, one responder interface will be used by multiple object structs. As such, the object is identified only by the memory address at which the struct SMOP__Object is stored.

This means that you can't really do anything to the object yourself, you can only talk to its responder interface. The object serves as both a way to find the correct responder interface, and a way to tell the responder interface which instance data to operate on -- and that is all.

There may be additional data below the .RI member, but if so, only the responder interface knows how to use it. The data for the object instance may, in fact, not be stored in the structure at all -- it could be looked up using the object's address in a completely separate data store.

As such, it is incorrect to attempt to copy or move a SMOP_Object struct using a simple memory copy like C's memcpy(). Even if you lucked out and got all the data in the object, you would have changed its address, and it would not be the same object anymore. This point is especially important to note if an object may exist in multiple address spaces -- only one address will be valid without special handling.

SMOP__ResponderInterface

The responder interface (which, of course, is also binary-compatible with SMOP__Object) implements the low-level part of the meta object protocol. It is through the responder interface that you can perform actions on the object.

Using the responder interface, arbitrary methods may be invoked on the object. It's important to realize that this method invocation happens at the same level that any high-level language might call. This means that there's no distinction between native operators and high-level operators, nor between native values and high-level values.

The structure of a responder interface is as follows:

struct SMOP__ResponderInterface {
  SMOP__ResponderInterface* RI;
  SMOP__Object* (*MESSAGE)  (SMOP__Object* interpreter,
                             SMOP__ResponderInterface* self,
                             SMOP__Object* identifier,
                             SMOP__Object* capture);
  SMOP__Object* (*REFERENCE)(SMOP__Object* interpreter,
                             SMOP__ResponderInterface* self,
                             SMOP__Object* object);
  SMOP__Object* (*RELEASE)  (SMOP__Object* interpreter,
                             SMOP__ResponderInterface* self,
                             SMOP__Object* object);
  SMOP__Object* (*WEAKREF)  (SMOP__Object* interpreter,
                             SMOP__ResponderInterface* self,
                             SMOP__Object* object);
  char* id;
  /* Maybe there is something here, maybe there is nothing here.
   * Only the responder interface in member .RI knows.
   */
}

However, the SMOP base defines a few macros that should be used when interacting with SMOP Objects. While in theory, the use of those macros is optional, it's strongly advised that you stick with them, to make transitions to newer versions easier.

As such, each of the function hooks defined in the above structure will be described along with the macros which should be used to access them.

macro SMOP_DISPATCH
SMOP_DISPATCH(interpreter, object, identifier, capture)

This macro (and all its parameters) correspond with the MESSAGE function hook member. This is the function that handles method invocation for the objects which this responder interface oversees:

SMOP__Object* (*MESSAGE)  (
    SMOP__Object* interpreter,      /* gets interpreter */
    SMOP__ResponderInterface* self, /* gets (responder) object */
    SMOP__Object* identifier,       /* gets identifier */
    SMOP__Object* capture           /* gets capture (instance object inside) */
);

As you might have noticed, it receives objects as arguments and returns, of course, an object.

SMOP_DISPATCH uses the .MESSAGE function in the responder found at object to invoke a method with a name found at identifier. It invokes that method in the context of the interpeter found at interpreter using the capture found at capture to pass data to the method's parameters.

Each of these macro arguments are expanded upon in other documentation, however, you may notice that something appears to be missing. Methods usually have an "invocant" -- which would be a SMOP__Object that was used to find the responder that is being pointed to in object above. If there is one, it is tucked away inside the capture.

macros SMOP_REFERENCE and SMOP_RELEASE
SMOP_REFERENCE(interpreter, object)
SMOP_RELEASE(interpreter, object)

SMOP_REFERENCE and SMOP_RELEASE call, respectively, the .REFERENCE and .RELEASE functions in a responder interface. The responder interface used is the one that is pointed to by the .RI member of the object structure pointed to by object. The object pointer itself is also passed to the REFERENCE or RELEASE function:

SMOP__Object* (*REFERENCE)(
    SMOP__Object* interpreter,       /* gets interpreter */
    SMOP__ResponderInterface* self,  /* gets the RI member found at object */
    SMOP__Object* object             /* gets object itself */
);

These functions increment or decrement the reference count of object in the context of interpreter. The reference count is used to handle automatic cleanup of objects when they are no longer needed -- more on this subject later.

The macros both return the same value that was passed into object, so you can use the macro in most places where you would use an object pointer, much like you would use i++ to postincrement an integer in-place. This is handy in keeping code terse, but take care, you should do nothing like SMOP_RELEASE(interp,current++) nor SMOP_RELEASE(interp,current)++ when working with arrays of objects.

macro SMOP_WEAKREF
SMOP_WEAKREF(interpreter, object)

SMOP_WEAKREF calls the .WEAKREF function in a responder interface. It works much the same way as the SMOP_REFERENCE macro, above.

SMOP__Object* (*WEAKREF)  (
    SMOP__Object* interpreter,       /* gets interpreter */
    SMOP__ResponderInterface* self,  /* gets the RI member found at object */
    SMOP__Object* object             /* gets object itself */
);

SMOP_WEAKREF can be used wherever you would normally use SMOP_REFERENCE to obtain a "weak reference" instead. This call is allowed to return you a different object than the one you point to with object, and you are supposed to use that as a proxy. Weak references do not count as a reference against the original object for the purposes of garbage collection.

This means that the original object may be freed before the weak reference itself is destroyed. If this happens, the weak reference will start to refer to some appropriate constant (like False) instead of the now-dead object.

The implementation of the weak-reference is private to each responder interface's implementation, so the exact behavior may vary depending on the kind of objects you are working with. Especially, note that if an object does not actually need to be reference counted, a weak reference may end up returning the original object, so you are not allowed to assume the macro will always return a different pointer than the one passed via object.

Note that a weak reference is itself an object. So you do still need to call SMOP_RELEASE on it when you are done with it. (It isn't provided just to help us be lazy.) However, all SMOP_REFERENCE and SMOP_RELEASE calls on the weak reference object count references to the the proxy object, not the original object.

That makes weak references a handy way to break circular dependencies between objects and code.

Other Macros

macro SMOP__Object__BASE

This macro defines the top members present in every SMOP Object, basically defining the members documented in the section above. Currently that is just the .RI member, but should members be added in future versions, they will appear in this list. It should be used when declaring new types of objects.

macro SMOP__ResponderInterface__BASE

Like the above macro, except that this defines the members present in all responder interface objects, as documented further above. Note this does not include SMOP__Object__BASE. It is best not to nest such macros to keep them reusable for compound types.

macro SMOP_RI(value)

Shorthand to dereference the .RI member of a SMOP__Object structure given the address of the SMOP__Object structure.

Talking Trash (Garbage Collection)

SMOP uses reference counting garbage collection convetions, as you probably can tell from the above documentation for SMOP_REFERENCE and SMOP_RELEASE.

In the initial implementation, a reference counting garbage collector was selected since this type of garbage collector is considerably simpler to implement (even if considerably harder to debug and maintain.) However, when design goals expanded to include interoperability with perl5, it became evident that following reference counting conventions would be a necessity in making SMOP and perl5 work together.

One thing that might not be obvious from the above technical notes is that it's up to each responder interface to implement its own garbage collector. This means that we can have several garbage collectors coexisting within the same process. For instance, the SMOP default low-level and the perl5 garbage collectors could both manage different sets of objects. In addition, objects that do not do any garbage collection at all may be present. Even in this case, all objects at least pretend to implement the mechanisms that make reference counting possible.

That is why the .REFERENCE, .RELEASE and .WEAKREF functions are included at the base level. Relatively few objects should be responder interfaces, so it is better for them just to carry vestigial members than make the code complex by trying to do without them. This set of functions should be sufficient to interact with the majority of reference counting garbage collectors.

Who owns an object?

This is the most important question: when to call SMOP_REFERENCE and when to call SMOP_RELEASE. The following documents the policy that must be followed to correctly garbage collect SMOP objects.

The below will refer to ownership "stakes" which belong to either sections of code, or other objects -- an ownership stake is a concept, not a solid object residing in memory somewhere. One stake in an object is merely an obligation by the owner to call SMOP_RELEASE once on the object, or to transfer the stake by ensuring that some other code will call SMOP_RELEASE on the object when appropriate.

There is also an obligation never to call SMOP_RELEASE on an object in which you have no ownership stakes.

REFERENCE/RELEASE conventions:

  • When an object is created, it becomes owned by that code which called the method that created it. The code has one ownership stake in the newly created object after the creation is complete.

  • Code that calls SMOP_REFERENCE assumes an additional ownership stake in the object. Since it is so easy to give away stakes, SMOP_REFERENCE is an important tool for keeping objects alive.

    As such, code may have more than one stake in a single object, even though there is no way to distinguish between the results of object creation or the results of any of the calls to SMOP_REFERENCE. It is up to the developer to keep count of the number of stakes.

    Code that has more than one stake in an object needs to SMOP_RELEASE (or transfer) the reference as many times as it has stakes, and only that many times.

  • Installing an object in a capture implies transferring one stake in the object to the capture object (or more than one, if the object is installed more than once in the capture.)

    As such, the code installing the object in the capture is no longer responsible for calling SMOP_RELEASE for this one ownership stake. If it has other ownership stakes, it must still call release for each of those.

    Note that this means to install an object in a capture more than once, you should have obtained more than one stake in the object, because the capture will call SMOP_RELEASE more than once.

    Also note that, as long as the capture is around to own the object, the original code may still use references to the object, without acquiring a new one. However, this may not be advisable for code legibility and maintainability.

  • References owned by capture objects will be automatically SMOP_RELEASEd when the capture object itself is destroyed. Capture objects automatically fulfill their obligations to ownership stakes (as long as the ownership stakes to the aggregate capture object itself are correctly fulfilled.)

    Again note, if an object is in a capture more than one time, the capture is going to call SMOP_RELEASE on the object more than one time when it is destroyed.

  • Once an object is installed in a capture, getting a new reference to the individual object requires the use of a special direct-access API that bypasses the normal .MESSAGE method calling interface. This procedure will be documented elsewhere -- the important thing to know is that the capture will automatically call SMOP_REFERENCE on any object extracted from it.

    This stake is owned by the code that extracted the value.

  • When a capture is passed to a SMOP_DISPATCH/.MESSAGE as the capture parameter, the code receiving the capture assumes one ownership stake in the capture object from the caller. That is, the caller has one less ownership stake in the capture after passing it on. Thus, the receiving code should SMOP_RELEASE the capture before returning (or pass it on somewhere else.)

    In this scenario, the capture is still the owner of the objects inside it.

  • TODO: ownership behavior of return.

  • A call to SMOP_RELEASE implies that this owner no longer wants one of its ownership stakes in the object. The owner will still retain any other ownership stakes.

  • Passing an object to the intepreter, object/self, or identifier parameters of a SMOP_DISPATCH/.MESSAGE does not transfer the ownership stake in that object, unlike the capture parameter.

  • If a SMOP_RELEASE or SMOP_REFERENCE happens inside a subroutine, and the subroutine returns with a net gain or loss of ownership stakes, then the code that called the subroutine will gain or lose that many ownership stakes. There is no requirement to keep all ownership stake manipulation within the same block of C code.

    However, from a good coding practice standpoint, it is avisable to balance ownership stakes where possible, or otherwise, to fully comment and document the behavior.

  • SMOP_WEAKREF is used to return a weak reference to an object, it may return a different pointer, to an entirely new object, owned by the code that called it. Calling SMOP_WEAKREF doesn't change the ownership stake in the original object (at least, never when it matters.)

    However, since it may create a new object, the weakref itself should still be SMOP_RELEASEd.

Summary

Most reference counting will happen around SMOP_DISPATCH/.MESSAGE method invocations.

In general, the caller can "fire and forget" and the callee has to clean up the mess. From the caller side, the only tricky part is remembering to take an extra SMOP_REFERENCE when installing one object into a capture more than once, or if the object is to be used after a capture it is inside has been destroyed.

The callee, on the other hand, must remember to SMOP_RELEASE any objects it extracted from the capture (once for every time that object is extracted) and after that, to SMOP_RELEASE the capture itself, before returning. Alternatively it may dispose of the ownership stakes by transferring them to other code or captures, like, for example, inside its result.

IMPORTANT SPEC NOTICE

This document describes everything that you can assume about an arbitrary object. This means that you can only introspect in more detail by either calling a method, or via special knowledge of the internals of the responder interface of the given object (for example, inside the code of the responder interface itself.)

It is erroneous to assume anything about the internal structure of any object, even responder interface objects, beyond what is described in this document.