Parrot Intermediate Representation

The Parrot intermediate representation (PIR) is an overlay on top of Parrot assembly language (PASM) that provides some simplifications and high-level constructs. It has many high-level features that ease the pain of working with PASM code, but it still isn't considered to be a high-level language by itself. PASM is discussed in more detail in CHP-5Chapter 5.

Internally Parrot works a little differently with PASM and PIR source code, so each has different restrictions. Parrot's default is to run in a mixed mode that allows PASM code to combine with the higher-level syntax unique to PIR. This gives the programmer flexibility to use aspects of each that are necessary.

A file with a .pasm extension is treated as pure PASM code by Parrot, as is any file run with the -a command-line option. This mode is mainly used for running pure PASM tests. Parrot treats any extension other than .pasm as a PIR file in mixed mode. As a convention, files containing pure PIR code generally have a .pir extension.

PIR is well documented, both in traditional documentation but also in instructional code examples. The documentation for the PIR compiler IMCC in imcc/docs/ or the project documentation in docs/ are good sources for information. The test suite in imcc/t shows examples of proper working code as it should be. These are all good starting points for digging deeper into the PIR syntax and functionality.

Statements

The syntax of statements in PIR is much more flexible than PASM. All PASM opcodes are valid PIR code, so the basic syntax is the same. The statement delimiter is a newline \n, so each statement has to be on its own line. Statements may also start with a label, for use with jumps and branches. Comments are marked by a hash sign (#), and continue until the end of the line. PIR also allows POD blocks for multi-line documentation.

Unlike PASM, PIR has some higher-level constructs, including symbol operators:

  I1 = 5                       # set I1, 5

named variables:

  count = 5

and complex statements built from multiple keywords and symbol operators:

  if I1 <= 5 goto LABEL        # le I1, 5, LABEL

We will get into all of these in more detail as we go.

Variables and Constants

Literal constants in PIR are the same as constants in PASM. Integers and floating-point numbers are numeric literals and strings are enclosed in quotes. PIR strings use the same escape sequences as PASM.

Parrot Registers

PIR code has a variety of ways to store values while you work with them. The most basic way is to use Parrot registers directly. PASM register names always start with a single character that shows whether it is an integer (I), numeric (N), string (S), or PMC (P) register, and end with the number of the register:

  S0 = "Hello, Polly.\n"
  print S0

You can have as many registers of each type as you need, Parrot will allocate new registers if you need more.Parrot registers are allocated in a linear array, and register numbers are indices into this array. Having more registers means Parrot must allocate more storage space for them, which can decrease memory efficency and register allocation/fetch performance. In general, it's better to keep the number of registers small, and to use registers with contiguous numbers to prevent growing the pool of allocated registers too large. Of course, as with any memory management situation, fewer allocations translates directly to improved performance.

Temporary Registers

PIR provides an easier way to work with Parrot registers. The temporary register variables are named like the PASM registers--with a single character for the type of register and a number--but they start with a $ character:

  set $S42, "Hello, Polly.\n"
  print $S42

The most obvious difference between Parrot registers ("P7") and temporary register variables ("$P7") is that you have an unlimited number of temporaries. Parrot handles register allocation automatically, and performs register reuse optimizations for you when it finds such situations.

The previous example used the $S42 temporary. When the code is compiled, that temporary is allocated to a Parrot register. As long as the temporary is needed, it is stored in the same register. When it's no longer needed, the Parrot register is re-allocated to some other value. This example uses two temporary string registers:

  $S42 = "Hello, "  # allocated to S16
  print $S42
  $S43 = "Polly.\n" # allocated to S16 again
  print $S43

Since they don't overlap, Parrot can allocate both to a single register. If you change the order a little so both temporaries are needed at the same time, Parrot will allocate them to different registers instead:

  $S42 = "Hello, "  # allocated to S17
  $S43 = "Polly.\n" # allocated to S16
  print $S42
  print $S43

In this case, $S42 is allocated to S17 and $S43 is allocated to S16. These numbers are hypothetical, of course. Which registers Parrot actually uses in these situations is based on a large number of factors.

Parrot allocates temporary registersAs well as named variables, which we'll talk about next. to Parrot registers in ascending order based on their score. The score is used to determine whether a register is being actively used, and whether it can be reused for another purpose. Variables used in a loop have a higher score than variables outside a loop. Variables that span a long range have a lower score than ones that are used only briefly. Variables which have a low score (and thus are used less) are shuffled and reused for new temporaries.

If you want to peek behind the curtain and see how Parrot is allocating registers, you can run it with the -d switch to turn on IMCC debugging output.

  $ parrot -d1000 hello.pir

If hello.pir contains this code from the second example above (wrapped in a subroutine definition so it will compile):

  .sub _main
    $S42 = "Hello, "  # allocated to S17
    $S43 = "Polly.\n" # allocated to S16
    print $S42
    print $S43
    end
  .end

it produces this output:

  code_size(ops) 11  oldsize 0
  0 set_s_sc 17 1 set S17, "Hello, "
  3 set_s_sc 16 0 set S16, "Polly.\n"
  6 print_s 17    print S17
  8 print_s 16    print S16
  10 end  end
  Hello, Polly.

That's probably a lot more information than you wanted if you're just starting out. You can also generate a PASM file with the -o switch and have a look at how the PIR code translates:

  $ parrot -o hello.pasm hello.pir

or just

  $ parrot -o- hello.pir

to see resulting PASM on stdout.

You'll find more details on these options and many others in CHP-11-SECT-4"Parrot Command-Line Options" in Chapter 11.

Named Variables

Named variables can be used anywhere a register or temporary register is used. They're declared with the .local statement or the equivalent .sym statement, which require a variable type and a name:

  .local string hello
  set hello, "Hello, Polly.\n"
  print hello

This snippet defines a string variable named hello, assigns it the value "Hello, Polly.\n", and then prints the value.

The valid types are int, num, string, and pmc or any Parrot class name (like PerlInt or PerlString). It should come as no surprise that these are the same divisions as Parrot's four register types. Named variables are valid from the point of their definition to the end of the compilation unit.

The name of a variable must be a valid PIR identifier. It can contain letters, digits, and underscores, but the first character has to be a letter or underscore. Identifiers don't have any limit on length yet, but it's a safe bet they will before the production release. Parrot opcode names are normally not allowed as variable names, though there are some exceptions.

PMC variables

PMC registers and variables act much like any integer, floating-point number, or string register or variable, but you have to instantiate a new PMC object before you use it. The new instruction creates a new PMC. Unlike PASM, PIR doesn't use a dot in front of the class name.

  P0 = new PerlString        # same as new P0, .PerlString
  P0 = "Hello, Polly.\n"
  print P0

This example creates a PerlString object, stores it in the PMC register P0, assigns the value "Hello, Polly.\n" to it, and prints it. The syntax is exactly the same for temporary register variables:

  $P4711 = new PerlString
  $P4711 = "Hello, Polly.\n"
  print $P4711

With named variables the type passed to the .local directive is either the generic pmc or a type compatible with the type passed to new:

  .local PerlString hello    # or .local pmc hello
  hello = new PerlString
  hello = "Hello, Polly.\n"
  print hello

Named Constants

The .const directive declares a named constant. It's very similar to .local, and requires a type and a name. The value of a constant must be assigned in the declaration statement. As with named variables, named constants are visible only within the compilation unit where they're declared. This example declares a named string constant hello and prints the value:

  .const string hello = "Hello, Polly.\n"
  print hello

Named constants function in all the same places as literal constants, but have to be declared beforehand:

  .const int the_answer = 42        # integer constant
  .const string mouse = "Mouse"     # string constant
  .const num pi = 3.14159           # floating point constant

Register Spilling

As we mentioned earlier, Parrot allocates all temporary register variables and named variables to Parrot registers. When Parrot runs out of registers to allocate, it has to store some of the variables elsewhere. This is known as spilling. Parrot spills the variables with the lowest score and stores them in a PerlArray object while they aren't used, then restores them to a register the next time they're needed. Consider an example that creates 33 integer variables, all containing values that are used later:

  set $I1, 1
  set $I2, 2
  ...
  set $I33, 33
  ...
  print $I1
  print $I2
  ...
  print $I33

Parrot allocates the 32 available integer registers to variables with a higher score and spills the variables with a lower score. In this example it picks $I1 and $I2. Behind the scenes, Parrot generates code to store the values:

  new P31, "PerlArray"
  ...
  set I0, 1           # I0 allocated to $I1
  set P31[0], I0      # spill $I1
  set I0, 2           # I0 reallocated to $I2
  set P31[1], I0      # spill $I2

It creates a PerlArray object and stores it in register P31.P31 is reserved for register spilling in PIR code, so generally it shouldn't be accessed directly. The set instruction is the last time $I1 is used for a while, so immediately after that, Parrot stores its value in the spill array and frees up I0 to be reallocated.

Just before $I1 and $I2 are accessed to be printed, Parrot generates code to fetch the values from the spill array:

  ...
  set I0, P31[0]       # fetch $I1
  print I0

You cannot rely on any particular register assignment for temporary variables or named variables. The register allocator does follow a set of precedence rules for allocation, but these rules may change. Also, if two variables have the same score Parrot may assign registers based on the hashed value of the variable name. Parrot randomizes the seed to the hash function to guarantee you never get a consistent order.

Symbol Operators

You probably noticed the = assignment operator in some of the earlier examples:

  $S2000 = "Hello, Polly.\n"
  print $S2000

Standing alone, it's the same as the PASM set opcode. In fact, if you run parrot in bytecode debugging mode (as in CHP-11-SECT-4.2"Assembler Options" in Chapter 11), you'll see it really is just a set opcode underneath.

PIR has many other symbol operators: arithmetic, concatenation, comparison, bitwise, and logical. Many of these combine with assignment to produce the equivalent of a PASM opcode:

  .local int sum
  sum = $I42 + 5
  print sum
  print "\n"

The statement sum = $I42 + 5 translates to something like add I16, I17, 5.

PIR also provides +=, -=, >>=, ... that map to the two-argument forms like add I16, I17.

Many PASM opcodes that return a single value also have an alternate syntax in PIR with the assignment operator:

  $I0 = length str               # length $I0, str
  $I0 = isa PerlInt, "scalar"    # isa $I0, PerlInt, "scalar"
  $I0 = exists hash["key"]       # exists $I0, hash["key"]
  $N0 = sin $N1
  $N0 = atan $N1, $N2
  $S0 = repeat "x", 20
  $P0 = newclass "Foo"
  ...

A complete list of PIR operators is available in CHP-11Chapter 11. We'll discuss the comparison operators in CHP-10-SECT-3"Symbol Operators" later in this chapter.

Labels

Like PASM, any line can start with a label definition like LABEL:, but label definitions can also stand on their own line.

PIR code has both local and global labels. Global labels start with an underscore. The name of a global label has to be unique, since it can be called at any point in the program. Local labels start with a letter. A local label is accessible only in the compilation unit where it's defined.We'll discuss compilation units in the next section. The name has to be unique there, but it can be reused in a different compilation unit.

  branch L1   # local label
  bsr    _L2  # global label

Labels are most often used in branching instructions and in subroutine calls.

Compilation Units

Compilation units in PIR are roughly equivalent to the subroutines or methods of a high-level language. Though they will be explained in more detail later, we introduce them here because all code in a PIR source file must be defined in a compilation unit. The simplest syntax for a PIR compilation unit starts with the .sub directive and ends with the .end directive:

  .sub _main
      print "Hello, Polly.\n"
      end
  .end

This example defines a compilation unit named _main that prints a string. The name is actually a global label for this piece of code. If you generate a PASM file from the PIR code (see the end of the CHP-10-SECT-2.2"Temporary Registers" section earlier in this chapter), you'll see that the name translates to an ordinary label:

  _main:
          print "Hello, Polly.\n"
          end

The first compilation unit in a file is normally executed first, but as in PASM you can flag any compilation unit as the first one to execute with the @MAIN marker. The convention is to name the first compilation unit _main, but the name isn't critical.

  .sub _first
      print "Polly want a cracker?\n"
      end
  .end

  .sub _main @MAIN
      print "Hello, Polly.\n"
      end
  .end

This code prints out "Hello, Polly." but not "Polly want a cracker?":

The CHP-10-SECT-6"Subroutines" section later in this chapter goes into much more detail about compilation units and their uses.

Flow Control

As in PASM, flow control in PIR is done entirely with conditional and unconditional branches. This may seem simplistic, but remember that PIR is a thin overlay on the assembly language of a virtual processor. For the average assembly language, jumps are the fundamental unit of flow control.

Any PASM branch instruction is valid, but PIR has some high-level constructs of its own. The most basic is the unconditional branch: goto.

  .sub _main
      goto L1
      print "never printed"
  L1:
      print "after branch\n"
      end
  .end

The first print statement never runs because the goto always skips over it to the label L1.

The conditional branches combine if or unless with goto.

  .sub _main
      $I0 = 42
      if $I0 goto L1
      print "never printed"
  L1: print "after branch\n"
      end
  .end

In this example, the goto branches to the label L1 only if the value stored in $I0 is true. The unless statement is quite similar, but branches when the tested value is false. An undefined value, 0, or an empty string are all false values. The if ... goto statement translates directly to the PASM if, and unless translates to the PASM unless.

The comparison operators (<, <=, ==, !=, >, >=) can combine with if ... goto. These branch when the comparison is true:

  .sub _main
      $I0 = 42
      $I1 = 43
      if $I0 < $I1 goto L1
      print "never printed"
  L1:
      print "after branch\n"
      end
  .end

This example compares $I0 to $I1 and branches to the label L1 if $I0 is less than $I1. The if $I0 < $I1 goto L1 statement translates directly to the PASM lt branch operation.

The rest of the comparison operators are summarized in CHP-11-SECT-3"PIR Instructions" in Chapter 11.

PIR has no special loop constructs. A combination of conditional and unconditional branches handle iteration:

  .sub _main
      $I0 = 1               # product
      $I1 = 5               # counter

  REDO:                     # start of loop
      $I0 = $I0 * $I1
      dec $I1
      if $I1 > 0 goto REDO  # end of loop

      print $I0
      print "\n"
      end
  .end

This example calculates the factorial 5!. Each time through the loop it multiplies $I0 by the current value of the counter $I1, decrements the counter, and then branches to the start of the loop. The loop ends when $I1 counts down to 0 so that the if doesn't branch to REDO. This is a do while-style loop with the condition test at the end, so the code always runs the first time through.

For a while-style loop with the condition test at the start, use a conditional branch together with an unconditional branch:

  .sub _main
      $I0 = 1               # product
      $I1 = 5               # counter

  REDO:                     # start of loop
      if $I1 <= 0 goto LAST
      $I0 = $I0 * $I1
      dec $I1
      goto REDO
  LAST:                     # end of loop

      print $I0
      print "\n"
      end
  .end

This example tests the counter $I1 at the start of the loop. At the end of the loop, it unconditionally branches back to the start of the loop and tests the condition again. The loop ends when the counter $I1 reaches 0 and the if branches to the LAST label. If the counter isn't a positive number before the loop, the loop never executes.

Any high-level flow control construct can be built from conditional and unconditional branches.

24 POD Errors

The following errors were encountered while parsing the POD:

Around line 5:

A non-empty Z<>

Around line 7:

Deleting unknown formatting code A<>

Around line 39:

A non-empty Z<>

Around line 69:

A non-empty Z<>

Around line 80:

A non-empty Z<>

Around line 91:

Deleting unknown formatting code N<>

Around line 104:

A non-empty Z<>

Around line 149:

Deleting unknown formatting code N<>

Around line 198:

Deleting unknown formatting code A<>

Around line 203:

A non-empty Z<>

Around line 236:

A non-empty Z<>

Around line 266:

A non-empty Z<>

Around line 289:

A non-empty Z<>

Around line 324:

Deleting unknown formatting code N<>

Around line 347:

A non-empty Z<>

Around line 356:

Deleting unknown formatting code A<>

Around line 388:

Deleting unknown formatting code A<>

Deleting unknown formatting code A<>

Around line 394:

A non-empty Z<>

Around line 401:

Deleting unknown formatting code N<>

Around line 417:

A non-empty Z<>

Around line 433:

Deleting unknown formatting code A<>

Around line 461:

Deleting unknown formatting code A<>

Around line 466:

A non-empty Z<>

Around line 529:

Deleting unknown formatting code A<>