The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

Parrot Assembler

The Parrot Assembler's job is to take .pasm (Parrot Assembly) files and assemble them into Parrot bytecode. Plenty of references for Parrot assembly syntax already exist, so we won't go into details there. The assembler does its job by reading a .pasm file, extracting numeric and string constants from it, and reassembling the bits into bytecode.

The first pass goes through and expands constants, macros, and local labels. Syntax is described later on, in the 'Macro' section. The next pass goes through and collects the numeric and string constants along with the definition points and PCs of labels.

If you would like to view the text after the macro expansion pass, use the -E flag. This flag simply tells the assembler to quit after the Macro class does it thing.

The final pass replaces label occurrences with the appropriate PC offset and accumulates the (finally completely numeric) bytecode onto the output string. The XS portion takes the constants and bytecode, generates a header, tacks the constants and bytecode on, and finally prints out the string.

Macro

The Parrot assebler's macro layer has now been more-or-less defined, with one or two additions to come. The addition of the '.' preface will hopefully make things easier to parse, inasmuch as everything within an assembler file that needs to be expanded or processed by the macro engine will have a period ('.') prepended to it.

The macro layer implements constants, macros, and local labels. Including files will be done later on, but this handles most of the basic needs we have for macros.

To create a macro, the syntax is slightly different.

  .macro swap (A,B,TEMP) # . marks the directive
    set .TEMP,.A         # . marks the special variable.
    set .A,.B
    set .B,.TEMP
  .endm                  # And . marks the end of the macro.

Macros support labels that are local to a given macro expansion, and the syntax looks something like this:

  .macro SpinForever (Count)
    .local $LOOP: dec .COUNT # ".local $LOOP" defines a local label.
                  branch .$LOOP # Jump to said label.
  .endm

Include this macro as many times as you like, and the branch statement should do the right thing every time. To use a global label, just as you usually do.

Constants are new, and the syntax looks like:

  .constant PerlHash 6 # Again, . marks the directive

  new P0, .PerlHash # . marks the special variable for expansion.

Several constants are predefined in the Macro class, but are not generated dynamically as they should be, at least not yet.

  .constant Array 0
  .constant PerlUndef 1
  ...

This should be generated from include/parrot/pmc.h, but my plans are to add a '.include' directive so we can '.include <constants.pmc>', and let pmc2c build the .pmc file at the same time as it builds pmc.h.

When the Assembler class is separated out, tests can use the Assembler class to accept a simple array of instructions and generate bytecode directly from that. This should eliminate the intermediary .pasm file and speed things up.

Keyed access

 We now support the following (tested) code:

  new P0, .PerlHash    # (See the discussion of macros above)
  set S0, "one"
  set P0[S0],1
  set I0,P0[S0]
  print I0
  print "\n"
  end

Macro class

new

Create a new Macro instance. Simply take the argument list and treat it as a list of files to concatenate and process. Files are taken in the order that they appear in the argument list.

_expand_macro

Take a macro name and argument list, and expand the macro inline. Also, if the macro has embedded labels, expand these labels to local labels, and make certain that they're unique on a per-expansion basis. We do this with the $self-{macros}{$macro_name}{gensym}> value.

preprocess

Preprocesses constants, macros, include statements, and eventually conditional compilation.

  .constant name {register}
  .constant name {signed_integer}
  .constant name {signed_float}
  .constant name {"string constant"}
  .constant name {'string constant'}

    Are removed from the array. Given the line:

    '.constant HelloWorld "Hello, World!"'

    One can expand HelloWorld via:

    'print .HelloWorld' # Note the period to indicate a thing to expand.

    Some predefined constants exist for your convenience, namely:

      .Array
      .PerlHash
      .PerlArray
      and the other PMC types.

    This should be generated from include/parrot/pmc.h, but isn't at the moment.
    A .include should be added, but currently is awaiting more time and sleep.

  .include "{file name}" # Not quite ready.

  .macro name ({arguments?})
  ...
  .endm

    Optional arguments are simply identifiers separated by commas. These
    arguments are matched to instances inside the macro named '.foo'. A
    simple example follows:

  .macro inc3 (A,BLAM)
    inc .A # Mark the argument to expand with a '.'.
    inc .A
    inc .A
    print .BLAM
  .endm

  .inc3(I0) # Expands to the obvious ('inc I0\n') x 3
contents

Access the $self-{contents}> internal array, where the post-processed data is stored.

Assembler class

new

Create a new Assembler instance.

  To compile a list of files:
    $compiler = Assembler->new(-files=>[qw(foo.pasm bar.pasm)]);

  To compile an array of instructions:
    $compiler = Assembler->new(-contents=>['set S0,"foo"','print S0','end']);
_annotate_contents

Process the array $self-{contents}>, and make the appropriate annotations in the array. For instance, it slightly munges global and local labels to make sure the statements fall where they should. Also, annotates the array into an AoA of [$statement,$lineno]. A later pass changes $lineno to $pc, once the arguments have been appropriately analyzed.

_init

Process files of assembly code, should they have been passed in. Also, regardless of the input to new(), take the arrays of operators and load them into a form appropriate to parsing.

_collect_labels

Collect labels, remove their definition, and save the appropriate line numbers. Local labels aren't given special treatment yet.

_generate_bytecode

Start out by walking the $self-{contents}> array. On the first pass, make sure that the operation requested exists. If it doesn't, yell on STDERR. If it does, replace the text version of the operator with its numeric index, and pack it into $self-{bytecode}>.

The inner loop walks through the arguments nested within the $op arrayref, determining what type the argument is ($_-[0]>), and packing in the appropriate code. Note that labels are precalculated, and constants have been packed into the appropriate areas.

adjust_labels

This works primarily on $self-{global_labels}>, computing offsets and getting things ready for the final shift. Since the values of $self-{global_labels}> correspond to line numbers, we replace the line numbers with program counter indices.

The next pass walks the $self-{contents}> array, replacing the label names with the difference between the current PC and the label PC. Label names are preserved in the previous pass, which makes this possible.

_string_constant

Unescape special characters in the constant and add them to not one but two data structures. $self-{constants}{s}> is for fast lookup when time comes to substitute constants for their indices, and $self-{ordered_constants}> keeps track of constants in order of occurrence, so they can be packed directly into the binary format.

_numeric_constant

Take the numeric constant and place it into both $self-{constants}{n}> and $self-{ordered_constants}>. The first hash lets us do fast lookup when time comes to replace a constant with its value. The second array maintains the various constants in order of first occurrence, and is ready to pack into the bytecode.

_key_constant

Build a key constant and place it into both $self-{constants}{n}> and $self-{ordered_constants}>. The first hash lets us do fast lookup when time comes to replace a constant with its value. The second array maintains the various constants in order of first occurrence, and is ready to pack into the bytecode.

constant_table

Constant table returns a hash with the length in bytes of the constant table and the constant table packed.

output_bytecode

Returns a string with the Packfile.

First process the constants and generate the constant table to be able to make the packfile header, then return all.

to_bytecode

Take the content array ref and turn it into a ragged AoAoA of operations with attached processed arguments. This is the core of the assembler.

  The transformation looks roughly like this:

  [ [ 'if I0,BLAH', 3],
    [ 'set P1[S5],P0["foo"]', 5],
    [ 'BLAH: end', 6],
  ]

  into:

  [ [ [ 'if_i_ic',
        ['i','I0'],
        ['label','BLAH'], # Leave the name here so we can resolve backward refs.
      ],
      3, # Line number
    ],
    [ [ 'set_p_s_p_sc',
        ['p','P1'],
        ['s','S5'],
        ['p','P0'],
        ['sc',0],    # String constant number 0
      ]
      5,
    ],
    [ [ 'end',
      ],
      6,
  ]

The first pass collects labels, so we can resolve forward label references (That is, labels used before they're defined). References to labels aren't yet expanded.

The second pass takes the arguments in each line ($_-[0]>) and breaks them into their components. It does this by passing each line through a loop of REs to break lines into each argument type. The individual REs break down the arguments into an array ref [$type,$argument]. Constants are collected and replaced with indices, and the number of arguments is counted and added to the internal PC tracking.

The third pass takes labels and replaces them with the PC offset to the actual instruction, and generates bytecode. It returns the bytecode, and we're done.

process_args

Process the argument list and return the list of arguments and files to process. Only legal and sane arguments and files should get past this point.

3 POD Errors

The following errors were encountered while parsing the POD:

Around line 135:

'=item' outside of any '=over'

Around line 405:

You forgot a '=back' before '=head2'

Around line 407:

'=item' outside of any '=over'

=over without closing =back