The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

README.txt - Readme file for pirc/new compiler, a fresh implementation of the PIR language using Bison and Flex.

AUTHOR

kjs

DESCRIPTION

pirc/new is a fresh implementation of the PIR language. Maintaining the current default implementation (IMCC) is a bit of a pain, and I wanted to see how far I could come with a fresh implementation. A lot of ugly things could be removed.

Of course, it is not finished yet. A lot of work is needed on the back-end before it can generate Parrot Byte Code files (PBC).

The current set-up is a three-phase compiler:

  • Heredoc pre-processor

    The heredoc pre-processor takes the input, and converts all heredoc strings into normal strings. So, the following:

     .sub main
        foo(<<'HI', <<'BYE')
     hi there!
    HI
     bye for now!
    BYE
    
     .end

    is converted into:

     .sub main
        foo(" hi there!\n", " bye for now!\n\n")
     .end

    Currently there is a small issue with the 2nd and later heredoc arguments; they seem to get one newline character too many.

    The heredoc pre-processor needs to know about POD comments, because the POD comment may contain a heredoc string, which should not be processed, as it is a comment. For that purpose, all comments (POD and line comments) are stripped in this phase.

    The Heredoc pre-processor is located in compilers/pirc/heredoc.

  • Macro pre-processor

    The macro pre-processor takes the output of the heredoc pre-processor, and handles all macro definitions and expansions. The .include directive is handled here too. The output of the macro pre-processor is (in case of uses of the .include directive) one long big file with "pure" PIR code.

    The macro pre-processor is located in compilers/pirc/macro.

  • PIR parser

    The third pass is done by the PIR parser, which takes the "pure" PIR code from the macro pre-processor. Currently, it's only a parser, but a future extension could be to generate PASM code from the PIR input. This way, it's easy to see what ops are actually executed when running the PIR file.

    The PIR parser is located in compilers/pirc/new.

The new implementation also has some unique features with respect to IMCC:

  • Multiple heredoc arguments

    In pirc/new (a new name is yet to be defined) it is allowed to use multiple heredocs as function arguments, like so:

       ...
       foo(<<'HI', <<'BYE')
    
       ...
     HI
    
       ...
     BYE
  • Heredoc arguments for macro expansions

    As the heredoc pre-processor handles the input before the macro pre-processor, it is now possible to expand macros specifying heredoc arguments, like so:

     .macro foo(a)
       print .a
     .end
    
     .sub main
       .foo(<<'HI')
      Hello world!
    HI
     .end
  • Reentrant

    The generated lexer and parser are fully re-entrant. (It does need to be tested, though).

  • Comments!

    The code is provided with comments, so you can actually understand what it does.

  • Pre-processing option

    Although IMCC does define the option '-E', it is not really working correctly. pirc has two pre-processing options: 1) running the heredoc parser only, 2) running both the heredoc and macro processors. The output of option 2 is the code that will be given to the PIR compiler.

  • Grammar cleanup

    This is a nice opportunity to clean up the grammar of the PIR language. Hacking on IMCC's grammar is possible, but not for the faint of heart.

NOTES

Usage

Currently the different compilers/pre-processors are located in different directories. The different pre-processors are invoked from the main driver in pirc.c. The latter assumes all three processors are compiled, as the following executables:

 heredoc pre-processor: hdocprep
 macro pre-processor:   macroparser

Running a file through the whole PIR compiler is then done as follows:

 $ ./pirc test.pir

When you want to run the heredoc pre-processor only, do this:

 $ ./pirc -H test.pir

When you want to pre-process the file only (heredoc + macro parsing), do this:

 $ ./pirc -E test.pir

Cygwin processable lexer spec.

The file pir.l from which the lexer is generated is not processable by Cygwin's default version of Flex. In order to make a reentrant lexer, a newer version is needed, which can be downloaded from the link below.

http://sourceforge.net/project/downloading.php?groupname=flex&filename=flex-2.5.33.tar.gz&use_mirror=belnet

Just do:

 $ ./configure
 $ make

Then make sure to overwrite the supplied flex binary.

BUGS

Having a look at this implementation would be greatly appreciated, and any resulting feedback even more :-)

  • All, except the first heredoc argument, contains 1 newline character too many.

  • Memory management needs to be improved.

  • The three passed should be integrated into 1 C program. This is possible, because the generated lexers and parser can be specified to get a different prefix than "yy". So, although there are 3 lexers and 2 parsers, all generated by Flex/Bison, they can be linked together. This is only necessary if it hugely improves performance w.r.t. pipes. This needs further research.

  • Braced macro arguments need to be finished.

SEE ALSO

See also:

  • languages/PIR for a PGE based implementation.

  • compilers/pirc, a hand-written, recursive-descent PIR parser.

  • compilers/imcc, the current standard PIR implementation.

  • docs/imcc/syntax.pod for a description of PIR syntax.

  • docs/imcc/ for more documentation about the PIR language.