The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

The Parrot intermediate representation (PIR) is an overlay on top of the Parrot assembly language (PASM). PIR provides some simplifications and high-level constructs to simplify programming tasks above what an assembly language like PASM typically would have. It has many high-level features that will be familiar to programmers, but it still isn't considered to be a high-level language by itself. In fact, the Parrot developers specifically want to keep in that way for a number of reasons. PASM, the Parrot assembly language on which PIR is based, is discussed in more detail in CHP-5Chapter 5.

As a convention, files containing pure PIR code generally have a .pir extension. Parrot will treat a file with any other extension, besides .pasm as a PIR file in mixed mode. This means that the file can contain PIR and PASM code interchangably, with a few caveats that will be discussed later.

PIR is well documented, both in traditional documentation and in instructional code examples. The documentation for the PIR compiler IMCC in imcc/docs/ or the project documentation in docs/ are good sources for information about the current syntax, semantics, and implementation. The other PIR compiler, PIRC, has it's own documentation that is slowly maturing. This is a useful source of information too. The test suite in imcc/t shows examples of proper working code. In fact, the test suite is the definitive PIR resource, because it shows how PIR actually works, even when the documentation may be out of date.

Statements

The syntax of statements in PIR is much more flexible then is commonly found in assembly languages, but is more rigid and "close to the machine" then some higher-level languages like C are. PIR has a very close relationship with the Parrot assembly language, PASM. All PASM instructions are valid PIR instructions. PIR does add some extra syntactic options to help improve readability and programmability, however. The statement delimiter for both PIR and PASM is a newline \n. Each statement has to be on its own line This isn't entirely true when you consider things like macros and heredocs, but we'll tackle those issues when we come to them., but empty whitespace lines between statements are okay. Statements may also start with a label, for use with jumps and branches. Comments are marked by a hash sign (#), and continue until the end of the line. POD blocks may be used for multi-line documentation.

To help with readability, PIR has some high-level constructs, including symbol operators:

  $I1 = 5                       # set $I1, 5

named variables:

  count = 5

and complex statements built from multiple keywords and symbol operators:

  if $I1 <= 5 goto LABEL        # le $I1, 5, LABEL

We will get into all of these in more detail as we go. Notice that PIR does not, and will not, have high-level looping structures like while or for loops and if/then/else branch structures. Because of these omissions PIR can become a little bit messy and unweildy for large programs. Luckily, there are a large group of high-level languages (HLL) that can be used to program Parrot instead. PIR is used primarily to write the compilers and libraries for these languages.

Variables and Constants

Parrot Registers

PIR code has a variety of ways to store values while you work with them. Actually, the only place to store values is in a Parrot register, but there are multiple ways to work with these registers. Parrot's register names always start with a dollar sign, followed by a single character that shows whether it is an integer (I), numeric (N), string (S), or PMC (P) register, and then the number of the register:

  $S0 = "Hello, Polly.\n"
  print $S0

You can have as many registers of each type as you need, Parrot will automatically allocate new ones for you. The process is transparent, and programmers should never have to worry about it.

Parrot registers are allocated in a linear array, and register numbers are indices into this array. Having more registers means Parrot must allocate more storage space for them, which can decrease memory efficency and register allocation/fetch performance. In general, it's better to keep the number of registers small. However, the number of the register does not necessarily correspond to the actual storage location where the register data is held. A memory allocator unit translates register names in the form "$S0" into an actual fixed memory location. This allocator can also help to optimize register usage so that existing registers are reused instead of allocating new ones in memory. The sort version is that the programmer should never have to worry about register allocation, and should feel free to use as many as she wants. As with any system, it's a good idea to be mindful of the things that might impact performance anyway.

Constants

Parrot has four primary data types: integers, floating-point numbers, strings, and PMCs. Integers and floating-point numbers can be specified in the code with numeric constants.

  $I0 = 42       # Integers are regular numeric constants
  $I1 = -1       # They can be negative or positive
  $I2 = 0xA5     # They can also be hexadecimal
  $I3 = 046      # ...or octal

  $N0 = 3.14     # Numbers can have a decimal point
  $N1 = 4        # ...or they don't
  $N2 = -1.2e+4  # Numbers can also use scientific notation.

String literals are enclosed in single or double-quotes:

  $S0 = "This is a valid literal string"
  $S1 = 'This is also a valid literal string'

Strings in double-quotes accept all sorts of escape sequences using backslashes. Strings in single-quotes only allow escapes for nested quotes:

  $S0 = "This string is \n on two lines"
  $S0 = 'This is a \n one-line string with a slash in it'

Or, if you need more flexibility, you can use a heredoc:

  $S2 = << "End_Token"

  This is a multi-line string literal. Notice that
  it doesn't use quotation marks. The string continues
  until the ending token (the thing in quotes next to
  the << above) is found.

  End_Token

Named Variables

Calling a value "$S0" isn't very descriptive, and usually it's a lot nicer to be able to refer to values using a helpful name. For this reason Parrot allows registers to be given temporary variable names to use instead. These named variables can be used anywhere a register would be used normally ...because they actually are registers, but with fancier names. They're declared with the .local statement which requires a variable type and a name:

  .local string hello
  set hello, "Hello, Polly.\n"
  print hello

This snippet defines a string variable named hello, assigns it the value "Hello, Polly.\n", and then prints the value.

The valid types are int, num, string, and pmc or any Parrot class name (like PerlInt or PerlString). It should come as no surprise that these are the same divisions as Parrot's four register types. Named variables are valid from the point of their definition to the end of the current function.

The name of a variable must be a valid PIR identifier. It can contain letters, digits, and underscores, but the first character has to be a letter or underscore. There is no limit to the length of an identifier, especially since the automatic code generators in use with the various high-level languages on parrot tend to generate very long identifier names in some situations. Of course, making huge identifier names could cause all sorts of memory allocation problems or inefficiencies in parsing. Push the limits at your own risk.

PMC variables

PMC registers and variables act much like any integer, floating-point number, or string register or variable, but you have to instantiate a new PMC object before you use it. The new instruction creates a new PMC of a specified type:

  $P0 = new 'PerlString'     # This is how the Perl people do it
  $P0 = "Hello, Polly.\n"
  print $P0

This example creates a PerlString object, stores it in the PMC register $P0, assigns the value "Hello, Polly.\n" to it, and prints it. With named variables the type passed to the .local directive is either the generic pmc or a type compatible with the type passed to new:

  .local PerlString hello    # or .local pmc hello
  hello = new PerlString
  hello = "Hello, Polly.\n"
  print hello

PIR is a dynamic language, and that dynamicism is readily displayed in the way PMC values are handled. Primitive registers like strings, numbers, and integers perform a special action called autoboxing when they are assigned to a PMC. Autoboxing is when a primative scalar type is automatically converted to a PMC object type. There are PMC classes for String, Number, and Integer which can be quickly converted to and from primitive int, number, and string types. Notice that the primative types are in lower-case, while the PMC classes are capitalized. We will discuss PMCs and all the details of their interactions in CHP-11 Chapter 11.

Named Constants

The .const directive declares a named constant. It's very similar to .local, and requires a type and a name. The value of a constant must be assigned in the declaration statement. As with named variables, named constants are visible only within the compilation unit where they're declared. This example declares a named string constant hello and prints the value:

  .const string hello = "Hello, Polly.\n"
  print hello

Named constants function in all the same places as literal constants, but have to be declared beforehand:

  .const int the_answer = 42        # integer constant
  .const string mouse = "Mouse"     # string constant
  .const num pi = 3.14159           # floating point constant

Symbol Operators

PIR has many other symbol operators: arithmetic, concatenation, comparison, bitwise, and logical. All PIR operators are translated into one or more PASM opcodes internally, but the details of this translation stay safely hidden from the programmer. Consider this example snippet:

  .local int sum
  sum = $I42 + 5
  print sum
  print "\n"

The statement sum = $I42 + 5 translates to something like add I16, I17, 5 in PASM. The exact translation isn't too important Unless you're hacking on IMCC or PIRC!, so we don't have to worry about it for now. We will talk more about PASM and it's instruction set in Chapter 5.

PIR also provides automatic assignment operators such as +=, -=, and >>=. These operators help programmers to perform common manipulations on a data value in place, and save a few keystrokes while doing them.

A complete list of PIR operators is available in CHP-13 Chapter 13.

Labels

Any line in PIR can start with a label definition like LABEL:, but label definitions can also stand on their own line. Labels are like flags or markers that the program can jump to or return to at different times. Labels and jump operations (which we will discuss a little bit later) are one of the primary methods to change control flow in PIR, so it is well worth understanding.

PIR code can contain both local and global labels. Global labels start with an underscore. The name of a global label has to be unique since it can be called at any point in the program. Local labels start with a letter. A local label is accessible only in the function where it is defined. The name has to be unique within that function, but the same name can be reused in other functions without causing a collision.

  branch L1   # local label
  bsr    _L2  # global label

Labels are most often used in branching instructions, which are used to implement high level control structures by our high-level language compilers.

Compilation Units

Compilation units in PIR are roughly equivalent to the subroutines or methods of a high-level language. Though they will be explained in more detail later, we introduce them here because all code in a PIR source file must be defined in a compilation unit. The simplest syntax for a PIR compilation unit starts with the .sub directive and ends with the .end directive:

  .sub main
      print "Hello, Polly.\n"
  .end

This example defines a compilation unit named main that prints a string.The first compilation unit in a file is normally executed first but you can flag any compilation unit as the first one to execute with the :main marker. The convention is to name the first compilation unit main, but the name isn't critical.

  .sub first
      print "Polly want a cracker?\n"
      end
  .end

  .sub main :main
      print "Hello, Polly.\n"
  .end

This code prints out "Hello, Polly." but not "Polly want a cracker?". This is because the function main has the :main flag, so it is executed first. The function first, which doesn't have this flag is never executed. However, if we change around this example a little:

  .sub first :main
      print "Polly want a cracker?\n"
      end
  .end

  .sub main
      print "Hello, Polly.\n"
  .end

The output now is "Polly want a cracker?". Execution in PIR starts at the main function and continues until the end of that function only. If you want to do more stuff if your program, you will need to call other functions explicitly.

CHP-04Chapter 4 goes into much more detail about compilation units and their uses.

Flow Control

Flow control in PIR is done entirely with conditional and unconditional branches. This may seem simplistic and primitive, but remember that PIR is a thin overlay on the assembly language of a virtual processor. High level control structures are invariably linked to the language in which they are used, so any attempt by Parrot to provide these structures would work for some languages but not by others. The only way to make sure all languages and their control structures can be equally accomodated is to simply give them the most simple and fundamental building blocks to work with. Language agnosticism is an important design goal in Parrot.

The most basic branching instruction is the unconditional branch: goto.

  .sub _main
      goto L1
      print "never printed"
  L1:
      print "after branch\n"
      end
  .end

The first print statement never runs because the goto always skips over it to the label L1.

The conditional branches combine if or unless with goto.

  .sub _main
      $I0 = 42
      if $I0 goto L1
      print "never printed"
  L1: print "after branch\n"
      end
  .end

In this example, the goto branches to the label L1 only if the value stored in $I0 is true. The unless statement is quite similar, but branches when the tested value is false. An undefined value, 0, or an empty string are all false values. Any other values are considered to be true values.

The comparison operators (<, <=, ==, !=, >, >=) can combine with if ... goto. These branch when the comparison is true:

  .sub _main
      $I0 = 42
      $I1 = 43
      if $I0 < $I1 goto L1
      print "never printed"
  L1:
      print "after branch\n"
      end
  .end

This example compares $I0 to $I1 and branches to the label L1 if $I0 is less than $I1. The if $I0 < $I1 goto L1 statement translates directly to the PASM lt branch operation.

The rest of the comparison operators are summarized in CHP-13-SECT-3"PIR Instructions" in Chapter 11.

PIR has no special loop constructs. A combination of conditional and unconditional branches handle iteration:

  .sub _main
      $I0 = 1               # product
      $I1 = 5               # counter

  REDO:                     # start of loop
      $I0 = $I0 * $I1
      dec $I1
      if $I1 > 0 goto REDO  # end of loop

      print $I0
      print "\n"
      end
  .end

This example calculates the factorial 5!. Each time through the loop it multiplies $I0 by the current value of the counter $I1, decrements the counter, and then branches to the start of the loop. The loop ends when $I1 counts down to 0 so that the if doesn't branch to REDO. This is a do while-style loop with the condition test at the end, so the code always runs the first time through.

For a while-style loop with the condition test at the start, use a conditional branch together with an unconditional branch:

  .sub _main
      $I0 = 1               # product
      $I1 = 5               # counter

  REDO:                     # start of loop
      if $I1 <= 0 goto LAST
      $I0 = $I0 * $I1
      dec $I1
      goto REDO
  LAST:                     # end of loop

      print $I0
      print "\n"
      end
  .end

This example tests the counter $I1 at the start of the loop. At the end of the loop, it unconditionally branches back to the start of the loop and tests the condition again. The loop ends when the counter $I1 reaches 0 and the if branches to the LAST label. If the counter isn't a positive number before the loop, the loop never executes.

Any high-level flow control construct can be built from conditional and unconditional branches.

Fortunately, libraries of macros have been developed that can implement more familiar syntax for many of these control structures. We will discuss these libraries in more detail in CHP-6 "PIR Standard Library".

21 POD Errors

The following errors were encountered while parsing the POD:

Around line 3:

Unknown directive: =head0

Around line 5:

A non-empty Z<>

Around line 7:

Deleting unknown formatting code A<>

Around line 38:

A non-empty Z<>

Around line 40:

Deleting unknown formatting code N<>

Around line 80:

A non-empty Z<>

Around line 84:

A non-empty Z<>

Around line 160:

A non-empty Z<>

Around line 162:

Deleting unknown formatting code N<>

Around line 198:

A non-empty Z<>

Around line 220:

Deleting unknown formatting code A<>

Around line 233:

A non-empty Z<>

Around line 256:

A non-empty Z<>

Around line 270:

Deleting unknown formatting code N<>

Around line 281:

Deleting unknown formatting code A<>

Around line 285:

A non-empty Z<>

Around line 312:

A non-empty Z<>

Around line 361:

Deleting unknown formatting code A<>

Around line 366:

A non-empty Z<>

Around line 432:

Deleting unknown formatting code A<>

Around line 492:

Deleting unknown formatting code A<>