The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

README.txt - Readme file for pirc/new compiler, a fresh implementation of the PIR language using Bison and Flex.

AUTHOR

kjs

DESCRIPTION

pirc/new is a fresh implementation of the PIR language. Maintaining the current default implementation (IMCC) is a bit of a pain, and it contains a lot of "XXX" and "TODO" and other kludge alerts. Eventually, this should be fixed.

PIRC is not finished yet. A lot of work is needed on the back-end before it can generate Parrot Byte Code files (PBC).

Note that pirc/new refers to a Lex/Yacc based implementation, while 'pirc' refers to the hand-written recursive-descent implementation, to be found in pirc/src directory.

The current set-up is a three-phase compiler:

  • Heredoc pre-processor

    The heredoc pre-processor takes the input, and converts all heredoc strings into normal strings. So, the following:

     .sub main
        foo(<<'HI', <<'BYE')
     hi there!
    HI
     bye for now!
    BYE
    
     .end

    is converted into:

     .sub main
        foo(" hi there!\n", " bye for now!\n\n")
     .end

    Currently there is a small issue with the 2nd and later heredoc arguments; they seem to get one newline character too many.

    The heredoc pre-processor needs to know about POD comments, because the POD comment may contain a heredoc string, which should not be processed, as it is a comment. For that purpose, all comments (POD and line comments) are stripped in this phase.

    The Heredoc pre-processor is located in compilers/pirc/heredoc.

  • Macro pre-processor

    The macro pre-processor takes the output of the heredoc pre-processor, and handles all macro definitions and expansions. The .include directive is handled here too. The output of the macro pre-processor is (in case of uses of the .include directive) one long big file with "pure" PIR code.

    The macro pre-processor is located in compilers/pirc/macro.

  • PIR parser

    The third pass is done by the PIR parser, which takes the "pure" PIR code from the macro pre-processor. Currently, it's only a parser, but a future extension could be to generate PASM code from the PIR input. This way, it's easy to see what ops are actually executed when running the PIR file.

    The PIR parser is located in compilers/pirc/new.

The new implementation also has some unique features with respect to IMCC:

  • Multiple heredoc arguments

    In pirc/new (a new name is yet to be defined) it is allowed to use multiple heredocs as function arguments, like so:

       ...
       foo(<<'HI', <<'BYE')
    
       ...
     HI
    
       ...
     BYE
  • Heredoc arguments for macro expansions

    As the heredoc pre-processor handles the input before the macro pre-processor, it is now possible to expand macros specifying heredoc arguments, like so:

     .macro foo(a)
       print .a
     .end
    
     .sub main
       .foo(<<'HI')
      Hello world!
    HI
     .end
  • Reentrant

    The generated lexer and parser are fully re-entrant. (It does need to be tested, though).

  • Comments!

    The code is provided with comments, so you can actually understand what it does.

  • Pre-processing option

    Although IMCC does define the option '-E', it is not really working correctly. pirc has two pre-processing options: 1) running the heredoc parser only, 2) running both the heredoc and macro processors. The output of option 2 is the code that will be given to the PIR compiler.

  • Grammar cleanup

    This is a nice opportunity to clean up the grammar of the PIR language. Hacking on IMCC's grammar is possible, but not for the faint of heart.

NOTES

Usage

Currently the different compilers/pre-processors are located in different directories. The different pre-processors are invoked from the main driver in pirc.c. The latter assumes all three processors are compiled, as the following executables:

 heredoc pre-processor: hdocprep
 macro pre-processor:   macroparser

Running a file through the whole PIR compiler is then done as follows:

 $ ./pirc test.pir

When you want to run the heredoc pre-processor only, do this:

 $ ./pirc -H test.pir

When you want to pre-process the file only (heredoc + macro parsing), do this:

 $ ./pirc -E test.pir

Cygwin processable lexer spec.

The file pir.l from which the lexer is generated is not processable by Cygwin's default version of Flex. In order to make a reentrant lexer, a newer version is needed, which can be downloaded from the link below.

http://sourceforge.net/project/downloading.php?groupname=flex&filename=flex-2.5.33.tar.gz&use_mirror=belnet

Just do:

 $ ./configure
 $ make

Then make sure to overwrite the supplied flex binary.

BUGS

Having a look at this implementation would be greatly appreciated, and any resulting feedback even more :-)

  • All, except the first heredoc argument, contains 1 newline character too many. Heredoc parsing is a bit complex, and there might be many other issues.

  • Memory management needs to be improved.

  • Braced macro argument handling needs a lot of testing.

REPLACING IMCC WITH PIRC

Eventually, either IMCC needs to be fixed rigorously, or, rewritten altogether. PIRC is an attempt to do the latter. The following things need to be considered when replacing IMCC with PIRC:

  • is_op

    PIRC needs a function to decide whether an identifier is an instruction. IMCC uses a function is_op that does this. For this to work, libparrot must be linked in, and I'm having trouble doing this.

  • register allocation

    IMCC has a register allocator, but I doubt whether it can be re-used by PIRC. The whole back-end of IMCC probably needs to be redesigned.

  • bytecode generation

    There must be a proper bytecode API for PIRC to use.

SEE ALSO

See also:

  • languages/PIR for a PGE based implementation.

  • compilers/pirc, a hand-written, recursive-descent PIR parser.

  • compilers/imcc, the current standard PIR implementation.

  • docs/imcc/syntax.pod for a description of PIR syntax.

  • docs/imcc/ for more documentation about the PIR language.