The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

Subroutines

Code reuse has become a cornerstone of modern software engineering. Common tasks are routinely packaged as libraries for later reuse by other developers. The most basic building block of code reuse is the "function" or "subroutine". A calculation like "the factorial of a number", for example, may be used several times in a large program. Subroutines allow this kind of functionality to be abstracted into a single stand-alone unit for reuse. PIR is a subroutine-based language in that all code in PIR must exist in a subroutine. Execution starts, as we have seen, in the :main subroutine, and others can be called to perform the tasks of a program. From subroutines we can construct more elaborate chunks of code reusability methods and objects. In this chapter we will talk about how subroutines work in PIR, and how they can be used by developers to create programs for Parrot.

Parrot supports multiple high-level languages, and each language uses a different syntax for defining and calling subroutines. The goal of PIR is not to be a high-level language in itself, but to provide the basic tools that other languages can use to implement them. PIR's syntax for subroutines may seem very primitive for this reason.

Parrot Calling Conventions

The way that Parrot calls a subroutine--by passing arguments, altering control flow, and returning results--is called the "Parrot Calling Conventions", or PCC. The details of PCC are generally hidden from the programmer, being partially implemented in C and being partially implemented in PASM. PIR has several constructs to gloss over these details, and the average programmer will not need to worry about them. PCC uses the Continuation Passing Style (CPS) to pass control to subroutines and back again. Again, the details of this can be largely ignored for developers who don't need it, but the power of this approach can be harnessed by those who do. We'll talk more about PCC and CPS in this and in later chapters as well.

Subroutine Calls

PIR's simplest subroutine call syntax looks much like a subroutine call from a high-level language. This example calls the subroutine fact with two arguments and assigns the result to $I0:

  $I0 = 'fact'(count, product)

This simple statement hides a great deal of complexity. It generates a subroutine PMC object, creates a continuation PMC object to return control flow after the subroutine, passes the arguments, looks up the subroutine by name (and by signature if it's been overloaded), it calls the subroutine, and finally it assigns the results of the call to the given register variables. This is quite a lot of work for a single statement, and that's ignoring the computational logic that the subroutine itself implements.

Expanded Subroutine Syntax

The single line subroutine call is incredibly convenient, but it isn't always flexible enough. So PIR also has a more verbose call syntax that is still more convenient than manual calls. This example pulls the subroutine fact out of the global symbol table into a PMC register and calls it:

  find_global $P1, "fact"

  .begin_call
    .arg count
    .arg product
    .call $P1
    .result $I0
  .end_call

The whole chunk of code from .begin_call to .end_call acts as a single unit. The .arg directive sets up and passes arguments to the call. The .call directive calls the subroutine, returns control flow after the subroutine has completed. The .result directive retrieves returned values from the call.

Subroutine Declarations

In addition to syntax for subroutine calls, PIR provides syntax for subroutine definitions. Subroutines are defined with the .sub directive, and end with the .end directive. We've already seen this syntax in our earlier examples. The .param defines input parameters and creates local named variables for them:

  .param int c

The .return directive allows the subroutine to return control flow to the calling subroutine, and optionally returns result output values.

Here's a complete code example that implements the factorial algorithm. The subroutine fact is a separate compilation unit, assembled and processed after the main function. Parrot resolves global symbols like the fact label between different units.

  # factorial.pir
  .sub main
     .local int count
     .local int product
     count = 5
     product = 1

     $I0 = fact(count, product)

     print $I0
     print "\n"
     end
  .end

  .sub fact
     .param int c
     .param int p

  loop:
     if c <= 1 goto fin
     p = c * p
     dec c
     branch loop
  fin:
     .return p
  .end

This example defines two local named variables, count and product, and assigns them the values 1 and 5. It calls the fact subroutine passing the two variables as arguments. In the call, the two arguments are assigned to consecutive integer registers, because they're stored in typed integer variables. The fact subroutine uses .param and the .return directives for retrieving parameters and returning results. The final printed result is 120.

Execution of the program starts at the :main subroutine or, if no subroutines are declared with :main at the first subroutine in the file. If multiple subroutines are declared with :main, the last of them is treated as the starting point. Eventually, declaring multiple subroutines with :main might cause a syntax error or some other bad behavior, so it's not a good idea to rely on it now.

Named Parameters

Parameters that are passed in a strict order like we've seen above are called '''positional arguments'''. Positional arguments are differentiated from one another by their position in the function call. Putting positional arguments in a different order will produce different effects, or may cause errors. Parrot supports a second type of parameter, a '''named parameter'''. Instead of passing parameters by their position in the string, parameters are passed by name and can be in any order. Here's an example:

 .sub 'MySub'
    .param int yrs :named("age")
    .param string call :named("name")
    $S0 = "Hello " . call
    $S1 = "You are " . yrs
    $S1 = $S1 . " years old
    print $S0
    print $S1
 .end

 .sub main :main
    'MySub'("age" => 42, "name" => "Bob")
 .end

In the example above, we could have easily reversed the order too:

 .sub main :main
    'MySub'("name" => "Bob", "age" => 42)    # Same!
 .end

Named arguments can be a big help because you don't have to worry about the exact order of variables, especially as argument lists get very long.

Optional Parameters

Sometimes there are parameters to a function that don't always need to be passed, or values for a parameter which should be given a default value if a different value hasn't been explicitly provided. Parrot provides a mechanism for allowing optional parameters to be specified, so an error won't be raised if the parameter isn't provided. Parrot also provides a flag value that can be tested to determine if an optional parameter has been provided or not, so a default value can be supplied.

Optional parameters are actually treated like two parameters: The value that may or may not be passed, and the flag value to determine if it has been or not. Here's an example declaration of an optional parameter:

  .param string name :optional
  .param int has_name :opt_flag

The :optional flag specifies that the given parameter is optional and does not necessarily need to be provided. The :opt_flag specifies that an integer parameter contains a boolean flag. This flag is true if the value was passed, and false otherwise. This means we can use logic like this to provide a default value:

  .param string name :optional
  .param int has_name :opt_flag
  if has_name goto we_have_a_name
    name = "Default value"
  we_have_a_name:

Optional parameters can be positional or named parameters. When using them with positional parameters, they must appear at the end of the list of positional parameters. Also, the :opt_flag parameter must always appear directly after the :optional parameter.

  .sub 'Foo'
    .param int optvalue :optional
    .param int hasvalue :opt_flag
    .param pmc notoptional          # WRONG!
    ...

  .sub 'Bar'
     .param int hasvalue :opt_flag
     .param int optvalue :optional  # WRONG!
     ...

  .sub 'Baz'
    .param int optvalue :optional
    .param pmc notoptional
    .param int hasvalue :opt_flag   # WRONG!
    ...

Optional parameters can also be mixed with named parameters:

  .sub 'MySub'
    .param int value :named("answer") :optional
    .param int has_value :opt_flag
    ...

This could be called in two ways:

  'MySub'("answer" => 42)  # with a value
  'MySub'()                # without

Sub PMCs

Subroutines are a PMC type in Parrot, and references to them can be stored in PMC registers and manipulated like other PMC types. You can get a subroutine in the current namespace with the get_global opcode:

  $P0 = get_global "MySubName"

Or, if you want to find a subroutine from a different namespace, you need to first select the namespace PMC and then pass that to get_global:

  $P0 = get_namespace "MyNamespace"
  $P1 = get_global $P0, "MySubName"

With a Sub PMC, there are lots of things you can do. You can obviously invoke it:

  $P0(1, 2, 3)

You can get its name or change its name:

  $S0 = $P0               # Get the current name
  $P0 = "MyNewSubName"    # Set a new name

You can get a hash of the complete metadata for the subroutine:

  $P1 = inspect $P0

The metadata fields in this hash are

  • pos_required

    The number of required positional parameters to the Sub

  • pos_optional

    The number of optional positional parameters to the Sub

  • named_required

    The number of required named parameters to the Sub

  • named_optional

    The number of optional named parameters to the Sub

  • pos_slurpy

    Returns 1 if the sub has a slurpy parameter to eat up extra positional args

  • named_slurpy

    Returns 1 if the sub has a slurpy parameter to eat up extra named args

Instead of getting the whole inspection hash, you can look for individual data items that you want:

  $I0 = inspect $P0, "pos_required"

If you want to get the total number of defined parameters to the Sub, you can call the arity method:

  $I0 = $P0.'arity'()

To get the namespace PMC that the Sub was defined into, you can call the get_namespace method:

  $P1 = $P0.'get_namespace'()

Subroutine PMCs are very useful things, and we will show more of their uses throughout this chapter.

The Commandline

Programs written in Parrot have access to arguments passed on the command line like any other program would.

  .sub MyMain :main
    .param pmc all_args :slurpy
    ...
  .end

Continuation Passing Style

Continuations are snapshots, a frozen image of the current execution state of the VM. Once we have a continuation we can invoke it to return to the point where the continuation was first created. It's like a magical timewarp that allows the developer to arbitrarily move control flow back to any previous point in the program there's actually no magic involved, just a lot of interesting ideas and involved code.

Continuations are not a new concept, they've been boggling the minds of Lisp and Scheme programmers for many years. However, despite all their power and flexibility they haven't been well-utilized in most modern programming languages or in their underlying libraries and virtual machines. Parrot aims to change that: In Parrot, almost every control flow manipulation including all subroutine, method, and coroutine calls, are performed using continuations. This mechanism is mostly hidden from developers who build applications on top of Parrot. The power and flexibility is available if people want to use it, but it's hidden behind more familiar constructs if not.

Doing all sorts of flow control using continuations is called Continuation Passing Style (CPS). CPS allows parrot to offer all sorts of neat features, such as tail-call optimizations and lexical subroutines.

Tailcalls

In many cases, a subroutine will set up and call another subroutine, and then return the result of the second call directly. This is called a tailcall, and is an important opportunity for optimization. Here's a contrived example in pseudocode:

  call add_two(5)

  subroutine add_two(value)
    value = add_one(value)
    return add_one(value)

In this example, the subroutine add_two makes two calls to c<add_one>. The second call to add_one is used as the return value. add_one is called and its result is immediately returned to the caller of add_two, it is never stored in a local register or variable in add_two, it's immediately returned. We can optimize this situation if we realize that the second call to add_one is returning to the same place that add_two is, and therefore can utilize the same return continuation as add_two uses. The two subroutine calls can share a return continuation, instead of having to create a new continuation for each call.

In PIR code, we use the .tailcall directive to make a tailcall like this, instead of the .return directive. .tailcall performs this optimization by reusing the return continuation of the parent function to make the tailcall. In PIR, we can write this example:

  .sub main :main
      .local int value
      value = add_two(5)
      say value
  .end

  .sub add_two
      .param int value
      .local int val2
      val2 = add_one(value
      .tailcall add_one(val2)
  .end

  .sub add_one
      .param int a
      .local int b
      b = a + 1
      return b
  .end

This example above will print out the correct value "7".

Creating and Using Continuations

Most often continuations are used implicitly by the other control-flow operations in Parrot. However, they can also be created and used explicitly when required. Continuations are like any other PMC, and can be created using the new keyword:

  $P0 = new 'Continuation'

The new continuation starts off in an undefined state. Attempting to invoke a new continuation after it's first been created will raise an exception. To prepare the continuation for use, a destination label must be assigned to it with the set_addr opcode:

    $P0 = new 'Continuation'
    set_addr $P0, my_label

  my_label:
    ...

To jump to the continuation's stored label and return the context to the state it was in when the continuation was created, use the invoke opcode or the () notation:

  invoke $P0  # Explicit using "invoke" opcode
  $P0()       # Same, but nicer syntax

Notice that even though you can use the subroutine notation $P0() to invoke the continuation, it doesn't make any sense to try and pass arguments to it or to try and return values from it:

  $P0 = new 'Continuation'
  set_addr $P0, my_label

  $P0(1, 2)      # WRONG!

  $P1 = $P0()    # WRONG!

Lexical Subroutines

As we've mentioned above, Parrot offers support for lexical subroutines. What this means is that we can define a subroutine by name inside a larger subroutine, and our "inner" subroutine is only visible and callable from the "outer" outer. The "inner" subroutine inherits all the lexical variables from the outer subroutine, but is able to define its own lexical variables that cannot be seen or modified by the outer subroutine. This is important because PIR doesn't have anything corresponding to blocks or nested scopes like some other languages have. Lexical subroutines play the role of nested scopes when they are needed.

If the subroutine is lexical, you can get its :outer with the get_outer method on the Sub PMC:

  $P1 = $P0.'get_outer'()

If there is no :outer PMC, this returns a NULL PMC. Conversely, you can set the outer sub:

  $P0.'set_outer'($P1)

Scope and HLLs

Let us diverge for a minute and start looking forward at the idea of High Level Languages (HLLs) such as Perl, Python, and Ruby. All of these languages allow nested scopes, or blocks within blocks that can have their own lexical variables. Let's look back at the C programming language, where this kind of construct is not uncommon:

  {
      int x = 0;
      int y = 1;
      {
          int z = 2;
          // x, y, and z are all visible here
      }
      // only x and y are visible here
  }

The code above illustrates this idea perfectly without having to get into a detailed and convoluted example: In the inner block, we define the variable z which is only visible inside that block. The outer block has no knowledge of z at all. However, the inner block does have access to the variables x and y. This is an example of nested scopes where the visibility of different data items can change in a single subroutine. As we've discussed above, Parrot doesn't have any direct analog for this situation: If we tried to write the code above directly, we would end up with this PIR code:

  .param int x
  .param int y
  .param int z
  x = 0
  y = 1
  z = 2
  ...

This PIR code is similar, but the handling of the variable z is different: z is visible throughout the entire current subroutine, where it is not visible throughout the entire C function. To help approximate this effect, PIR supplies lexical subroutines to create nested lexical scopes.

PIR Scoping

In PIR, there is only one structure that supports scoping like this: the subroutine and objects that inherit from subroutines, such as methods, coroutines, and multisubs, which we will discuss later. There are no blocks in PIR that have their own scope besides subroutines. Fortunately, we can use these lexical subroutines to simulate this behavior that HLLs require:

  .sub 'MyOuter'
      .lex int x
      .lex int y
      'MyInner'()
      # only x and y are visible here
  .end

  .sub 'MyInner' :outer('MyOuter')
      .lex int z
      #x, y, and z are all "visible" here
  .end

In the example above we put the word "visible" in quotes. This is because lexically-defined variables need to be accessed with the get_lex and set_lex opcodes. These two opcodes don't just access the value of a register, where the value is stored while it's being used, but they also make sure to interact with the LexPad PMC that's storing the data. If the value isn't properly stored in the LexPad, then they won't be available in nested inner subroutines, or available from :outer subroutines either.

Lexical Variables

As we have seen above, we can declare a new subroutine to be a nested inner subroutine of an existing outer subroutine using the :outer flag. The outer flag is used to specify the name of the outer subroutine. Where there may be multiple subroutines with the same name such is the case with multisubs, which we will discuss soon, we can use the :subid flag on the outer subroutine to give it a different--and unique--name that the lexical subroutines can reference in their :outer declarations. Within lexical subroutines, the .lex command defines a local variable that follows these scoping rules.

LexPad and LexInfo PMCs

Information about lexical variables in a subroutine is stored in two different types of PMCs: The LexPad PMC that we already mentioned briefly, and the LexInfo PMCs which we haven't. Neither of these PMC types are really usable from PIR code, but are instead used by Parrot internally to store information about lexical variables.

LexInfo PMCs are used to store information about lexical variables at compile time. This is read-only information that is generated during compilation to represent what is known about lexical variables. Not all subroutines get a LexInfo PMC by default, you need to indicate to Parrot somehow that you require a LexInfo PMC to be created. One way to do this is with the .lex directive that we looked at above. Of course, the .lex directive only works for languages where the names of lexical variables are all known at compile time. For languages where this information isn't known, the subroutine can be flagged with :lex instead.

LexPad PMCs are used to store run-time information about lexical variables. This includes their current values and their type information. LexPad PMCs are created at runtime for subs that have a LexInfo PMC already. These are created each time the subroutine is invoked, which allows for recursive subroutine calls without overwriting variable names.

With a Subroutine PMC, you can get access to the associated LexInfo PMC by calling the 'get_lexinfo' method:

  $P0 = find_global "MySubroutine"
  $P1 = $P0.'get_lexinfo'()

Once you have the LexInfo PMC, there are a limited number of operations that you can call with it:

  $I0 = elements $P1    # Get the number of lexical variables from it
  $P0 = $P1["name"]     # Get the entry for lexical variable "name"

There really isn't much else useful to do with LexInfo PMCs, they're mostly used by Parrot internally and aren't helpful to the PIR programmer.

There is no easy way to get a reference to the current LexPad PMC in a given subroutine, but like LexInfo PMCs that doesn't matter because they aren't useful from PIR anyway. Remember that subroutines themselves can be lexical and that therefore the lexical environment of a given variable can extend to multiple subroutines and therefore multiple LexPads. The opcodes find_lex and store_lex automatically search through nested LexPads recursively to find the proper environment information about the given variables.

Compilation Units Revisited

The term "compilation unit" is one that's been bandied about throughout the chapter and it's worth some amount of explanation here. A compilation unit is a section of code that forms a single unit. In some instances the term can be used to describe an entire file. In most other cases, it's used to describe a single subroutine. Our earlier example which created a 'fact' subroutine for calculating factorials could be considered to have used two separate compilation units: The main subroutine and the fact subroutine. Here is a way to rewrite that algorithm using only a single subroutine instead:

  .sub main
      $I1 = 5           # counter
      call fact         # same as "bsr fact"
      print $I0
      print "\n"
      $I1 = 6           # counter
      call fact
      print $I0
      print "\n"
      end

  fact:
      $I0 = 1           # product
  L1:
      $I0 = $I0 * $I1
      dec $I1
      if $I1 > 0 goto L1
      ret
  .end

The unit of code from the fact label definition to ret is a reusable routine, but is only usable from within the main subroutine. There are several problems with this simple approach. In terms of the interface, the caller has to know to pass the argument to fact in $I1 and to get the result from $I0. This is different from how subroutines are normally invoked in PIR.

Another disadvantage of this approach is that main and fact share the same compilation unit, so they're parsed and processed as one piece of code. They share registers. They would also share LexInfo and LexPad PMCs, if any were needed by main. The fact routine is also not easily usable from outside the c<main> subroutine, so other parts of your code won't have access to it. This is a problem when trying to follow normal encapsulation guidelines.

Namespaces, Methods, and VTABLES

PIR provides syntax to simplify writing methods and method calls for object-oriented programming. We've seen some method calls in the examples above, especially when we were talking about the interfaces to certain PMC types. We've also seen a little bit of information about classes and objects in the previous chapter. PIR allows you to define your own classes, and with those classes you can define method interfaces to them. Method calls follow the same Parrot calling conventions that we have seen above, including all the various parameter configurations, lexical scoping, and other aspects we have already talked about.

Classes can be defined in two ways: in C and compiled to machine code, and in PIR. The former is how the built-in PMC types are defined, like ResizablePMCArray, or Integer. These PMC types are either built with Parrot at compile time, or are compiled into a shared library called a dynpmc and loaded into Parrot at runtime. We will talk about writing PMCs in C, and dealing with dynpmcs in chapter 11.

The second type of class can be defined in PIR at runtime. We saw some examples of this in the last chapter using the newclass and subclass opcodes. We also talked about class attribute values. Now, we're going to talk about associating subroutines with these classes, and they're called methods. Methods are just like other normal subroutines with two major changes: they are marked with the :method flag, and they exist in a namespace. Before we can talk about methods, we need to discuss namespaces first.

Namespaces

Namespaces provide a mechanism where names can be reused. This may not sound like much, but in large complicated systems, or systems with many included libraries, it can be very handy. Each namespace get's its own area for function names and global variables. This way you can have multiple functions named create or new or convert, for instance, without having to use Multi-Method Dispatch (MMD) which we will describe later. Namespaces are also vital for defining classes and their methods, which we already mentioned. We'll talk about all those uses here.

Namespaces are specified with the .namespace [] directive. The brackets are not optional, but the keys inside them are. Here are some examples:

  .namespace [ ]               # The root namespace
  .namespace [ "Foo" ]         # The namespace "Foo"
  .namespace [ "Foo" ; "Bar" ] # Namespace Foo::Bar
  .namespace                   # WRONG! The [] are needed

Using semicolons, namespaces can be nested to any arbitrary depth. Namespaces are special types of PMC, so we can access them and manipulate them just like other data objects. We can get the PMC for the root namespace using the get_root_namespace opcode:

  $P0 = get_root_namespace

The current namespace, which might be different from the root namespace can be retrieved with the get_namespace opcode:

  $P0 = get_namespace             # get current namespace PMC
  $P0 = get_namespace ["Foo"]     # get PMC for namespace "Foo"

Namespaces are arranged into a large n-ary tree. There is the root namespace at the top of the tree, and in the root namespace are various special HLL namespaces. Each HLL compiler gets its own HLL namespace where it can store its data during compilation and runtime. Each HLL namespace may have a large hierarchy of other namespaces. We'll talk more about HLL namespaces and their significance in chapter 10.

The root namespace is a busy place. Everybody could be lazy and use it to store all their subroutines and global variables, and then we would run into all sorts of collisions. One library would define a function "Foo", and then another library could try to create another subroutine with the same name. This is called namespace pollution, because everybody is trying to put things into the root namespace, and those things are all unrelated to each other. Best practices requires that namespaces be used to hold private information away from public information, and to keep like things together.

As an example, the namespace Integers could be used to store subroutines that deal with integers. The namespace images could be used to store subroutines that deal with creating and manipulating images. That way, when we have a subroutine that adds two numbers together, and a subroutine that performs additive image composition, we can name them both add without any conflict or confusion. And within the image namespace we could have sub namespaces for jpeg and MRI and schematics, and each of these could have a add method without getting into each other's way.

The short version is this: use namespaces. There aren't any penalties to them, and they do a lot of work to keep things organized and separated.

Namespace PMC

The .namespace directive that we've seen sets the current namespace. In PIR code, we have multiple ways to address a namespace:

  # Get namespace "a/b/c" starting at the root namespace
  $P0 = get_root_namespace ["a" ; "b" ; "c"]

  # Get namespace "a/b/c" starting in the current HLL namespace.
  $P0 = get_hll_namespace ["a" ; "b" ; "c"]
  # Same
  $P0 = get_root_namespace ["hll" ; "a" ; "b" ; "c"]

  # Get namespace "a/b/c" starting in the current namespace
  $P0 = get_namespace ["a" ; "b" ; "c"]

Once we have a namespace PMC we can retrieve global variables and subroutine PMCs from it using the following functions:

  $P1 = get_global $S0            # Get global in current namespace
  $P1 = get_global ["Foo"], $S0   # Get global in namespace "Foo"
  $P1 = get_global $P0, $S0       # Get global in $P0 namespace PMC

Operations on the Namespace PMC

We've seen above how to find a Namespace PMC. Once you have it, there are a few things you can do with it. You can find methods and variables that are stored in the namespace, or you can add new ones:

  $P0 = get_namespace
  $P0.'add_namespace'($P1)      # Add Namespace $P1 to $P0
  $P1 = $P0.'find_namespace'("MyOtherNamespace")

  # Find namespace "MyNamespace" in $P0, create it if it
  #    doesn't exist
  $P1 = $P0.'make_namespace'("MyNamespace")

  $P0.'add_sub'("MySub", $P2)   # Add Sub PMC $P2 to the namespace
  $P1 = $P0.'find_sub'("MySub") # Find it

  $P0.'add_var'("MyVar", $P3)   # Add variable "MyVar" in $P3
  $P1 = $P0.'find_var'("MyVar") # Find it

  # Return the name of Namespace $P0 as a ResizableStringArray
  $P3 = $P0.'get_name'()

  # Find the parent namespace that contains this one:
  $P5 = $P0.'get_parent'()

  # Get the Class PMC associated with this namespace:
  $P6 = $P0.'get_class'()

There are a few other operations that can be done on Namespaces, but none as interesting as these. We'll talk about Namespaces throughout the rest of this chapter.

Calling Methods

Now that we've discussed namespaces, we can start to discuss all the interesting things that namespaces enable, like object-oriented programming and method calls. Methods are just like subroutines, except they are invoked on a object PMC, and that PMC is passed as the c<self> parameter.

The basic syntax for a method call is similar to the single line subroutine call above. It takes a variable for the invocant PMC and a string with the name of the method:

  object."methodname"(arguments)

Notice that the name of the method must be contained in quotes. If the name of the method is not contained in quotes, it's treated as a named variable that does. Here's an example:

  .local string methname = "Foo"
  object.methname()               # Same as object."Foo"()
  object."Foo"()                  # Same 

The invocant can be a variable or register, and the method name can be a literal string, string variable, or method object PMC.

Defining Methods

Methods are defined like any other subroutine except with two major differences: They must be inside a namespace named after the class they are a part of, and they must use the :method flag.

  .namespace [ "MyClass"]

  .sub "MyMethod" :method
    ...

Inside the method, the invocant object can be accessed using the self keyword. self isn't the only name you can call this value, however. You can also use the :invocant flag to define a new name for the invocant object:

  .sub "MyMethod" :method
    $S0 = self                    # Already defined as "self"
    say $S0
  .end

  .sub "MyMethod2" :method
    .param pmc item :invocant     # "self" is now called "item"
    $S0 = item
    say $S0
  .end

This example defines two methods in the Foo class. It calls one from the main body of the subroutine and the other from within the first method:

  .sub main
    .local pmc class
    .local pmc obj
    newclass class, "Foo"       # create a new Foo class
    new obj, "Foo"              # instantiate a Foo object
    obj."meth"()                # call obj."meth" which is actually
    print "done\n"              # in the "Foo" namespace
    end
  .end

  .namespace [ "Foo" ]          # start namespace "Foo"

  .sub meth :method             # define Foo::meth global
     print "in meth\n"
     $S0 = "other_meth"         # method names can be in a register too
     self.$S0()                 # self is the invocant
  .end

  .sub other_meth :method       # define another method
     print "in other_meth\n"    # as above Parrot provides a return
  .end                          # statement

Each method call looks up the method name in the object's class namespace. The .sub directive automatically makes a symbol table entry for the subroutine in the current namespace.

When a .sub is declared as a :method, it automatically creates a local variable named self and assigns it the object passed in P2. You don't need to write .param pmc self to get it, it comes free with the method.

You can pass multiple arguments to a method and retrieve multiple return values just like a single line subroutine call:

  (res1, res2) = obj."method"(arg1, arg2)

VTABLEs

PMCs all subscribe to a common interface of functions called VTABLEs. Every PMC implements the same set of these interfaces, which perform very specific low-level tasks on the PMC. The term VTABLE was originally a shortened form of the name "virtual function table", although that name isn't used any more by the developers, or in any of the documentation In fact, if you say "virtual function table" to one of the developers, they probably won't know what you are talking about. The virtual functions in the VTABLE, called VTABLE interfaces, are similar to ordinary functions and methods in many respects. VTABLE interfaces are occasionally called "VTABLE functions", or "VTABLE methods" or even "VTABLE entries" in casual conversation. A quick comparison shows that VTABLE interfaces are not really subroutines or methods in the way that those terms have been used throughout the rest of Parrot. Like methods on an object, VTABLE interfaces are defined for a specific class of PMC, and can be invoked on any member of that class. Likewise, in a VTABLE interface declaration, the self keyword is used to describe the object that it is invoked upon. That's where the similarities end, however. Unlike ordinary subroutines or methods, VTABLE methods cannot be invoked directly, they are also not inherited through class hierarchies like how methods are. With all this terminology discussion out of the way, we can start talking about what VTABLES are and how they are used in Parrot.

VTABLE interfaces are the primary way that data in the PMC is accessed and modified. VTABLES also provide a way to invoke the PMC if it's a subroutine or subroutine-like PMC. VTABLE interfaces are not called directly from PIR code, but are instead called internally by Parrot to implement specific opcodes and behaviors. For instance, the invoke opcode calls the invoke VTABLE interface of the subroutine PMC, while the inc opcode on a PMC calls the increment VTABLE interface on that PMC. What VTABLE interface overrides do, in essence, is to allow the programmer to change the very way that Parrot accesses PMC data in the most fundamental way, and changes the very way that the opcodes act on that data.

PMCs, as we will look at more closely in later chapters, are typically implemented using PMC Script, a layer of syntax and macros over ordinary C code. A PMC compiler program converts the PMC files into C code for compilation as part of the ordinary build process. However, VTABLE interfaces can be written and overwritten in PIR using the :vtable flag on a subroutine declaration. This technique is used most commonly when subclassing an existing PMC class in PIR code to create a new data type with custom access methods.

VTABLE interfaces are declared with the :vtable flag:

  .sub 'set_integer' :vtable
      #set the integer value of the PMC here
  .end

in which case the subroutine must have the same name as the VTABLE interface it is intended to implement. VTABLE interfaces all have very specific names, and you can't override one with just any arbitrary name. However, if you would like to name the function something different but still use it as a VTABLE interface, you could add an additional name parameter to the flag:

  .sub 'MySetInteger' :vtable('set_integer')
      #set the integer value of the PMC here
  .end

VTABLE interfaces are often given the :method flag also, so that they can be used directly in PIR code as methods, in addition to being used by Parrot as VTABLE interfaces. This means we can have the following:

  .namespace [ "MyClass" ]

  .sub 'ToString' :vtable('get_string') :method
      $S0 = "hello!"
      .return($S0)
  .end

  .namespace [ "OtherClass" ]

  .local pmc myclass = new "MyClass"
  say myclass                 # say converts to string internally
  $S0 = myclass               # Convert to a string, store in $S0
  $S0 = myclass.'ToString'()  # The same

Inside a VTABLE interface definition, the self local variable contains the PMC on which the VTABLE interface is invoked, just like in a method declaration.

Roles

As we've seen above and in the previous chapter, Class PMCs and NameSpace PMCs work to keep classes and methods together in a logical way. There is another factor to add to this mix: The Role PMC.

Roles are like classes, but don't stand on their own. They represent collections of methods and VTABLES that can be added into an existing class. Adding a role to a class is called composing that role, and any class that has been composed with a role does that role.

Roles are created as PMC and can be manipulated through opcodes and methods like other PMCs:

  $P0 = new 'Role'
  $P1 = get_global "MyRoleSub"
  $P0.'add_method'("MyRoleSub", $P1)

Once we've created a role and added methods to it, we can add that role to a class, or even to another role:

  $P1 = new 'Role'
  $P2 = new 'Class'
  $P1.'add_role'($P0)
  $P2.'add_role'($P0)
  add_role $P2, $P0    # Same!

Now that we have added the role, we can check whether we implement it:

  $I0 = does $P2, $P0  # Yes

We can get a list of roles from our Class PMC:

  $P3 = $P2.'roles'()

Roles are very useful for ensuring that related classes all implement a common interface.

Coroutines

We've mentioned coroutines several times before, and we're finally going to explain what they are. Coroutines are similar to subroutines except that they have an internal notion of state And the cool new name!. Coroutines, in addition to performing a normal .return to return control flow back to the caller and destroy the lexical environment of the subroutine, may also perform a .yield operation. .yield returns a value to the caller like .return can, but it does not destroy the lexical state of the coroutine. The next time the coroutine is called, it continues execution from the point of the last .yield, not at the beginning of the coroutine.

In a Coroutine, when we continue from a .yield, the entire lexical environment is the same as it was when .yield was called. This means that the parameter values don't change, even if we call the coroutine with different arguments later.

Defining Coroutines

Coroutines are defined like any ordinary subroutine. They do not require any special flag or any special syntax to mark them as being a coroutine. However, what sets them apart is the use of the .yield directive. .yield plays several roles:

  • Identifies coroutines

    When Parrot sees a yield, it knows to create a Coroutine PMC object instead of a Sub PMC.

  • Creates a continuation

    Continuations, as we have already seen, allow us to continue execution at the point of the continuation later. It's like a snapshot of the current execution environment. .yield creates a continuation in the coroutine and stores the continuation object in the coroutine object or later resuming from the point of the .yield.

  • Returns a value

    .yield can return a value or many values, or no values to the caller. It is basically the same as a .return in this regard.

Here is a quick example of a simple coroutine:

  .sub MyCoro
    .yield(1)
    .yield(2)
    .yield(3)
    .return(4)
  .end

  .sub main :main
    $I0 = MyCoro()    # 1
    $I0 = MyCoro()    # 2
    $I0 = MyCoro()    # 3
    $I0 = MyCoro()    # 4
    $I0 = MyCoro()    # 1
    $I0 = MyCoro()    # 2
    $I0 = MyCoro()    # 3
    $I0 = MyCoro()    # 4
    $I0 = MyCoro()    # 1
    $I0 = MyCoro()    # 2
    $I0 = MyCoro()    # 3
    $I0 = MyCoro()    # 4
  .end

This is obviously a contrived example, but it demonstrates how the coroutine stores it's state. The coroutine stores it's state when we reach a .yield directive, and when the coroutine is called again it picks up where it last left off. Coroutines also handle parameters in a way that might not be intuitive. Here's an example of this:

  .sub StoredConstant
    .param int x
    .yield(x)
    .yield(x)
    .yield(x)
  .end

  .sub main :main
    $I0 = StoredConstant(5)       # $I0 = 5
    $I0 = StoredConstant(6)       # $I0 = 5
    $I0 = StoredConstant(7)       # $I0 = 5
    $I0 = StoredConstant(8)       # $I0 = 8
  .end

Notice how even though we are calling the StoredConstant coroutine with different arguments each time, the value of parameter x doesn't change until the coroutine's state resets after the last .yield. Remember that a continuation takes a snapshot of the current state, and the .yield directive takes a continuation. The next time we call the coroutine, it invokes the continuation internally, and returns us to the exact same place in the exact same condition as we were when we called the .yield. In order to reset the coroutine and enable it to take a new parameter, we must either execute a .return directive or reach the end of the coroutine.

Multiple Dispatch

Multiple dispatch is when there are multiple subroutines in a single namespace with the same name. These functions must differ, however, in their parameter list, or "signature". All subs with the same name get put into a single PMC called a MultiSub. The MultiSub is like a list of subroutines. When the multisub is invoked, the MultiSub PMC object searches through the list of subroutines and searches for the one with the closest matching signature. The best match is the sub that gets invoked.

Defining MultiSubs

MultiSubs are subroutines with the :multi flag applied to them. MultiSubs (also called "Multis") must all differ from one another in the number and/or type of arguments passed to the function. Having two multisubs with the same function signature could result in a parsing error, or the later function could overwrite the former one in the multi.

Multisubs are defined like this:

  .sub 'MyMulti' :multi
      # does whatever a MyMulti does
  .end

Multis belong to a specific namespace. Functions in different namespaces with the same name do not conflict with each other this is one of the reasons for having multisubs in the first place!. It's only when multiple functions in a single namespace need to have the same name that a multi is used.

Multisubs take a special designator called a multi signature. The multi signature tells Parrot what particular combination of input parameters the multi accepts. Each multi will have a different signature, and Parrot will be able to dispatch to each one depending on the arguments passed. The multi signature is specified in the :multi directive:

  .sub 'Add' :multi(I, I)
    .param int x
    .param int y
    .return(x + y)
  .end

  .sub 'Add' :multi(N, N)
    .param num x
    .param num y
    .return(x + y)
  .end

  .sub Start :main
    $I0 = Add(1, 2)      # 3
    $N0 = Add(3.14, 2.0) # 5.14
    $S0 = Add("a", "b")  # ERROR! No (S, S) variant!
  .end

Multis can take I, N, S, and P types, but they can also use _ (underscore) to denote a wildcard, and a string that can be the name of a particular PMC type:

  .sub 'Add' :multi(I, I)  # Two integers
    ...

  .sub 'Add' :multi(I, 'Float')  # An integer and Float PMC
    ...

                           # Two Integer PMCs
  .sub 'Add' :multi('Integer', _)
    ...

When we call a multi PMC, Parrot will try to take the most specific best-match variant, and will fall back to more general variants if a perfect best-match cannot be found. So if we call 'Add'(1, 2), Parrot will dispatch to the (I, I) variant. If we call 'Add'(1, "hi"), Parrot will match the (I, _) variant, since the string in the second argument doesn't match I or 'Float'. Parrot can also choose to automatically promote one of the I, N, or S values to an Integer, Float, or String PMC.

To make the decision about which multi variant to call, Parrot takes a Manhattan Distance between the two. Parrot calculates the distance between the multi signatures and the argument signature. Every difference counts as one step. A difference can be an autobox from a primitive type to a PMC, or the conversion from one primitive type to another, or the matching of an argument to a _ wildcard. After Parrot calculates the distance to each variant, it calls the function with the lowest distance. Notice that it's possible to define a variant that is impossible to call: for every potential combination of arguments there is a better match. This isn't necessarily a common occurrence, but it's something to watch out for in systems with a lot of multis and a limited number of data types in use.

13 POD Errors

The following errors were encountered while parsing the POD:

Around line 5:

A non-empty Z<>

Around line 31:

A non-empty Z<>

Around line 339:

Deleting unknown formatting code N<>

Around line 517:

Deleting unknown formatting code N<>

Around line 545:

Deleting unknown formatting code N<>

Around line 604:

A non-empty Z<>

Around line 651:

A non-empty Z<>

Around line 685:

A non-empty Z<>

Around line 805:

A non-empty Z<>

Around line 899:

Deleting unknown formatting code N<>

Around line 1023:

Deleting unknown formatting code N<>

Around line 1065:

Deleting unknown formatting code N<>

Around line 1150:

Deleting unknown formatting code N<>