The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

The Parrot intermediate representation (PIR) is an overlay on top of Parrot assembly language, designed to make the developer's life easier. It has many high-level features that ease the pain of working with PASM code, but it still isn't a high-level language.

Internally, Parrot works a little differently with PASM and PIR source code, so each has different restrictions. The default is to run in a mixed mode that allows PASM code to combine with the higher-level syntax unique to PIR.

A file with a .pasm extension is treated as pure PASM code, as is any file run with the -a command-line option. This mode is mainly used for running pure PASM tests. Parrot treats any extension other than .pasm as a PIR file. As a convention files containing PIR code generally have a .pir extension.

The documentation in imcc/docs/ or docs/ and the test suite in imcc/t are good starting points for digging deeper into the PIR syntax and functionality.

Statements

The syntax of statements in PIR is much more flexible than PASM. All PASM opcodes are valid PIR code, so the basic syntax is the same. The statement delimiter is a newline \n, so each statement has to be on its own line. Any statement can start with a label. Comments are marked by a hash sign (#) and PIR allows POD blocks.

But unlike PASM, PIR has some higher-level constructs, including symbol operators:

  I1 = 5                       # set I1, 5

named variables:

  count = 5

and complex statements built from multiple keywords and symbol operators:

  if I1 <= 5 goto LABEL        # le I1, 5, LABEL

We'll get into these in more detail as we go.

Variables and Constants

Literal constants in PIR are the same as constants in PASM. Integers and floating-point numbers are numeric literals and strings are enclosed in quotes. PIR strings use the same escape sequences as PASM.

Parrot Registers

PIR code has a variety of ways to store values while you work with them. The most basic way is to use Parrot registers directly. PASM register names always start with a single character that shows whether it is an integer, numeric, string, or PMC register, and end with the number of the register (between 0 and 31):

  S0 = "Hello, Polly.\n"
  print S0

When you work directly with Parrot registers, you can only have 32 registers of any one type at a time.Only 31 for PMC registers, because P31 is reserved for spilling. If you have more than that, you have to start shuffling stored values on and off the user stack. You also have to manually track when it's safe to reuse a register. This kind of low-level access to the Parrot registers is handy when you need it, but it's pretty unwieldy for large sections of code.

Temporary Registers

PIR provides an easier way to work with Parrot registers. The temporary register variables are named like the PASM registers--with a single character for the type of register and a number--but they start with a $ character:

  set $S42, "Hello, Polly.\n"
  print $S42

The most obvious difference between Parrot registers and temporary register variables is that you have an unlimited number of temporaries. Parrot handles register allocation for you. It keeps track of how long a value in a Parrot register is needed and when that register can be reused.

The previous example used the $S42 temporary. When the code is compiled, that temporary is allocated to a Parrot register. As long as the temporary is needed, it is stored in the same register. When it's no longer needed, the Parrot register is re-allocated to some other value. This example uses two temporary string registers:

  $S42 = "Hello, "
  print $S42
  $S43 = "Polly.\n"
  print $S43

Since they don't overlap, Parrot allocates both to the S16 register. If you change the order a little so both temporaries are needed at the same time, they're allocated to different registers:

  $S42 = "Hello, "  # allocated to S17
  $S43 = "Polly.\n" # allocated to S16
  print $S42
  print $S43

In this case, $S42 is allocated to S17 and $S43 is allocated to S16.

Parrot allocates temporary variablesAs well as named variables, which we'll talk about next. to Parrot registers in ascending order of their score. The score is based on a number of factors related to variable usage. Variables used in a loop have a higher score than variables outside a loop. Variables that span a long range have a lower score than ones that are used only briefly.

If you want to peek behind the curtain and see how Parrot is allocating registers, you can run it with the -d switch to turn on debugging output.

  $ parrot -d1000 hello.pir

If hello.pir contains this code from the second example above (wrapped in a subroutine definition so it will compile):

  .sub _main
    $S42 = "Hello, "  # allocated to S17
    $S43 = "Polly.\n" # allocated to S16
    print $S42
    print $S43
    end
  .end

it produces this output:

  code_size(ops) 11  oldsize 0
  0 set_s_sc 17 1 set S17, "Hello, "
  3 set_s_sc 16 0 set S16, "Polly.\n"
  6 print_s 17    print S17
  8 print_s 16    print S16
  10 end  end
  Hello, Polly.

That's probably a lot more information than you wanted if you're just starting out. You can also generate a PASM file with the -o switch and have a look at how the PIR code translates:

  $ parrot -o hello.pasm hello.pir

or just

  $ parrot -o- hello.pir

to see resulting PASM on stdout.

You'll find more details on these options and many others in CHP-11-SECT-4"Parrot Command-Line Options" in Chapter 11.

Named Variables

Named variables can be used anywhere a register or temporary register is used. They're declared with the .local statement or the equivalent .sym statement, which require a variable type and a name:

  .local string hello
  set hello, "Hello, Polly.\n"
  print hello

This snippet defines a string variable named hello, assigns it the value "Hello, Polly.\n", and then prints the value.

The valid types are int, num, string, and pmc or any Parrot class name (like PerlInt or PerlString). It should come as no surprise that these are the same divisions as Parrot's four register types. Named variables are valid from the point of their definition to the end of the compilation unit.

The name of a variable must be a valid PIR identifier. It can contain letters, digits, and underscores, but the first character has to be a letter or underscore. Identifiers don't have any limit on length yet, but it's a safe bet they will before the production release. Parrot opcode names are normally not allowed as variable names, though there are some exceptions.

PMC variables

PMC registers and variables act much like any integer, floating-point number, or string register or variable, but you have to instantiate a new PMC object before you use it. The new instruction creates a new PMC. Unlike PASM, PIR doesn't use a dot in front of the class name.

  P0 = new PerlString        # same as new P0, .PerlString
  P0 = "Hello, Polly.\n"
  print P0

This example creates a PerlString object, stores it in the PMC register P0, assigns the value "Hello, Polly.\n" to it, and prints it. The syntax is exactly the same for temporary register variables:

  $P4711 = new PerlString
  $P4711 = "Hello, Polly.\n"
  print $P4711

With named variables the type passed to the .local directive is either the generic pmc or a type compatible with the type passed to new:

  .local PerlString hello    # or .local pmc hello
  hello = new PerlString
  hello = "Hello, Polly.\n"
  print hello

Named Constants

The .const directive declares a named constant. It's very similar to .local, and requires a type and a name. The value of a constant must be assigned in the declaration statement. As with named variables, named constants are visible only within the compilation unit where they're declared. This example declares a named string constant hello and prints the value:

  .const string hello = "Hello, Polly.\n"
  print hello

Named constants function in all the same places as literal constants, but have to be declared beforehand:

  .const int the_answer = 42        # integer constant
  .const string mouse = "Mouse"     # string constant
  .const num pi = 3.14159           # floating point constant

Register Spilling

As we mentioned earlier, Parrot allocates all temporary register variables and named variables to Parrot registers. When Parrot runs out of registers to allocate, it has to store some of the variables elsewhere. This is known as spilling. Parrot spills the variables with the lowest score and stores them in a PerlArray object while they aren't used, then restores them to a register the next time they're needed. Consider an example that creates 33 integer variables, all containing values that are used later:

  set $I1, 1
  set $I2, 2
  ...
  set $I33, 33
  ...
  print $I1
  print $I2
  ...
  print $I33

Parrot allocates the 32 available integer registers to variables with a higher score and spills the variables with a lower score. In this example it picks $I1 and $I2. Behind the scenes, Parrot generates code to store the values:

  new P31, "PerlArray"
  ...
  set I0, 1           # I0 allocated to $I1
  set P31[0], I0      # spill $I1
  set I0, 2           # I0 reallocated to $I2
  set P31[1], I0      # spill $I2

It creates a PerlArray object and stores it in register P31.P31 is reserved for register spilling in PIR code, so generally it shouldn't be accessed directly. The set instruction is the last time $I1 is used for a while, so immediately after that, Parrot stores its value in the spill array and frees up I0 to be reallocated.

Just before $I1 and $I2 are accessed to be printed, Parrot generates code to fetch the values from the spill array:

  ...
  set I0, P31[0]       # fetch $I1
  print I0

You cannot rely on any particular register assignment for temporary variables or named variables. The register allocator does follow a set of precedence rules for allocation, but these rules may change. Also, if two variables have the same score Parrot may assign registers based on the hashed value of the variable name. Parrot randomizes the seed to the hash function to guarantee you never get a consistent order.

Symbol Operators

You probably noticed the = assignment operator in some of the earlier examples:

  $S2000 = "Hello, Polly.\n"
  print $S2000

Standing alone, it's the same as the PASM set opcode. In fact, if you run parrot in bytecode debugging mode (as in CHP-11-SECT-4.2"Assembler Options" in Chapter 11), you'll see it really is just a set opcode underneath.

PIR has many other symbol operators: arithmetic, concatenation, comparison, bitwise, and logical. Many of these combine with assignment to produce the equivalent of a PASM opcode:

  .local int sum
  sum = $I42 + 5
  print sum
  print "\n"

The statement sum = $I42 + 5 translates to something like add I16, I17, 5.

PIR also provides +=, -=, >>=, ... that map to the two-argument forms like add I16, I17.

Many PASM opcodes that return a single value also have an alternate syntax in PIR with the assignment operator:

  $I0 = length str               # length $I0, str
  $I0 = isa PerlInt, "scalar"    # isa $I0, PerlInt, "scalar"
  $I0 = exists hash["key"]       # exists $I0, hash["key"]
  $N0 = sin $N1
  $N0 = atan $N1, $N2
  $S0 = repeat "x", 20
  $P0 = newclass "Foo"
  ...

A complete list of PIR operators is available in CHP-11Chapter 11. We'll discuss the comparison operators in CHP-10-SECT-3"Symbol Operators" later in this chapter.

Labels

Like PASM, any line can start with a label definition like LABEL:, but label definitions can also stand on their own line.

PIR code has both local and global labels. Global labels start with an underscore. The name of a global label has to be unique, since it can be called at any point in the program. Local labels start with a letter. A local label is accessible only in the compilation unit where it's defined.We'll discuss compilation units in the next section. The name has to be unique there, but it can be reused in a different compilation unit.

  branch L1   # local label
  bsr    _L2  # global label

Labels are most often used in branching instructions and in subroutine calls.

Compilation Units

Compilation units in PIR are roughly equivalent to the subroutines or methods of a high-level language. Though they will be explained in more detail later, we introduce them here because all code in a PIR source file must be defined in a compilation unit. The simplest syntax for a PIR compilation unit starts with the .sub directive and ends with the .end directive:

  .sub _main
      print "Hello, Polly.\n"
      end
  .end

This example defines a compilation unit named _main that prints a string. The name is actually a global label for this piece of code. If you generate a PASM file from the PIR code (see the end of the CHP-10-SECT-2.2"Temporary Registers" section earlier in this chapter), you'll see that the name translates to an ordinary label:

  _main:
          print "Hello, Polly.\n"
          end

The first compilation unit in a file is normally executed first, but as in PASM you can flag any compilation unit as the first one to execute with the @MAIN marker. The convention is to name the first compilation unit _main, but the name isn't critical.

  .sub _first
      print "Polly want a cracker?\n"
      end
  .end

  .sub _main @MAIN
      print "Hello, Polly.\n"
      end
  .end

This code prints out "Hello, Polly." but not "Polly want a cracker?":

The CHP-10-SECT-6"Subroutines" section later in this chapter goes into much more detail about compilation units and their uses.

Flow Control

As in PASM, flow control in PIR is done entirely with conditional and unconditional branches. This may seem simplistic, but remember that PIR is a thin overlay on the assembly language of a virtual processor. For the average assembly language, jumps are the fundamental unit of flow control.

Any PASM branch instruction is valid, but PIR has some high-level constructs of its own. The most basic is the unconditional branch: goto.

  .sub _main
      goto L1
      print "never printed"
  L1:
      print "after branch\n"
      end
  .end

The first print statement never runs because the goto always skips over it to the label L1.

The conditional branches combine if or unless with goto.

  .sub _main
      $I0 = 42
      if $I0 goto L1
      print "never printed"
  L1: print "after branch\n"
      end
  .end

In this example, the goto branches to the label L1 only if the value stored in $I0 is true. The unless statement is quite similar, but branches when the tested value is false. An undefined value, 0, or an empty string are all false values. The if ... goto statement translates directly to the PASM if, and unless translates to the PASM unless.

The comparison operators (<, <=, ==, !=, >, >=) can combine with if ... goto. These branch when the comparison is true:

  .sub _main
      $I0 = 42
      $I1 = 43
      if $I0 < $I1 goto L1
      print "never printed"
  L1:
      print "after branch\n"
      end
  .end

This example compares $I0 to $I1 and branches to the label L1 if $I0 is less than $I1. The if $I0 < $I1 goto L1 statement translates directly to the PASM lt branch operation.

The rest of the comparison operators are summarized in CHP-11-SECT-3"PIR Instructions" in Chapter 11.

PIR has no special loop constructs. A combination of conditional and unconditional branches handle iteration:

  .sub _main
      $I0 = 1               # product
      $I1 = 5               # counter

  REDO:                     # start of loop
      $I0 = $I0 * $I1
      dec $I1
      if $I1 > 0 goto REDO  # end of loop

      print $I0
      print "\n"
      end
  .end

This example calculates the factorial 5!. Each time through the loop it multiplies $I0 by the current value of the counter $I1, decrements the counter, and then branches to the start of the loop. The loop ends when $I1 counts down to 0 so that the if doesn't branch to REDO. This is a do while-style loop with the condition test at the end, so the code always runs the first time through.

For a while-style loop with the condition test at the start, use a conditional branch together with an unconditional branch:

  .sub _main
      $I0 = 1               # product
      $I1 = 5               # counter

  REDO:                     # start of loop
      if $I1 <= 0 goto LAST
      $I0 = $I0 * $I1
      dec $I1
      goto REDO
  LAST:                     # end of loop

      print $I0
      print "\n"
      end
  .end

This example tests the counter $I1 at the start of the loop. At the end of the loop, it unconditionally branches back to the start of the loop and tests the condition again. The loop ends when the counter $I1 reaches 0 and the if branches to the LAST label. If the counter isn't a positive number before the loop, the loop never executes.

Any high-level flow control construct can be built from conditional and unconditional branches.

24 POD Errors

The following errors were encountered while parsing the POD:

Around line 3:

Unknown directive: =head0

Around line 5:

A non-empty Z<>

Around line 33:

A non-empty Z<>

Around line 61:

A non-empty Z<>

Around line 72:

A non-empty Z<>

Around line 83:

Deleting unknown formatting code N<>

Around line 93:

A non-empty Z<>

Around line 136:

Deleting unknown formatting code N<>

Around line 182:

Deleting unknown formatting code A<>

Around line 187:

A non-empty Z<>

Around line 220:

A non-empty Z<>

Around line 250:

A non-empty Z<>

Around line 273:

A non-empty Z<>

Around line 308:

Deleting unknown formatting code N<>

Around line 331:

A non-empty Z<>

Around line 340:

Deleting unknown formatting code A<>

Around line 372:

Deleting unknown formatting code A<>

Deleting unknown formatting code A<>

Around line 378:

A non-empty Z<>

Around line 385:

Deleting unknown formatting code N<>

Around line 401:

A non-empty Z<>

Around line 417:

Deleting unknown formatting code A<>

Around line 445:

Deleting unknown formatting code A<>

Around line 450:

A non-empty Z<>

Around line 513:

Deleting unknown formatting code A<>