The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Pugs Apocryphon 2 - Overview of Pugs Internals

DATE

This document could get out of date very quickly. If it seems that more than a week has passed between the time there was an update to the time you read these words, prod someone on #perl6, or read http://use.perl.org/~autrijus/journal and see if there's been any big change.

The current copy was last revised on 2005-06-03.

PRELUDE

Pugs is written in the Haskell language. Before dabbling with Pugs internals it may be wise to study a bit of Haskell.

INTRODUCTION

Pugs is a versatile project, tapping into the power of many other projects. Pugs itself fits into a star topology, optionally using these projects to gain more features.

Each will be discussed later in more detail.

PUGS RUNTIME OVERVIEW

Pugs's own core is also componentized. The separation roughly coincides with Perl 6's runtime and compile time. However, it is notable that the parts are intermixed, since Perl 6 is a very dynamic language.

The two parts are roughly:

The Parser

This part takes a string of Perl 6 and converts it into the AST, or Abstract Syntax Tree. The AST represents the program's structure, which the evaluator later executes.

This part is responsible for the compilation of Perl 6 source code.

The Evaluator

The evaluator combines the program's AST with what is known as the Env, or roughly speaking, the current state of the program's execution.

It walks the nodes of the tree, reducing them into values. Of course, the interesting part is what happens during the reduction - this is the actual execution of the code.

This part is reponsible for the runtime - the execution of compiled Perl 6 ASTs.

SOURCE TREE OVERVIEW

This section does not discuss the files in detail. Pugs is documented with Haddock, and for reference that is the place to look.

What this section does provide is an overview of the responsibilities each part has in overall structure of Pugs.

src/Pugs/AST.hs

This file contains the definitions of the AST's types.

It is more or less a description of how Perl 6 programs can look after compilation.

src/Pugs/Parser.hs

This file contains the parser for Perl 6 code. It is written using the Parsec library.

It produces Syn and Exp structures as defined in AST.hs, and puts them in the envBody of the env.

src/Pugs/Eval.hs

This file implements the evaluation logic for the AST. Its main job is reducing Exps into Vals. Most Exps require applying VCode objects, which represent closures (blocks, subroutines, operators...), looking up variables, or other basic features Perl 6 provides, and this is where most of this is coded.

src/Pugs/Prim.hs

This file contains the implementations of many of the primitive operators. For example, the addition operator, &infix:<+> is defined here. It converts the two Perl values it gets into Haskell Nums, applies Haskell's builtin addition operator to these, and then makes a Perl value out of the result.

The various operators and builtin functions are implemented using the opN function, and the definition of their Perlish behavior is defined in the table at the bottom.

The table basically says whether the builtin is infix or not, how many parameters it accepts, and so forth.

src/Pugs/Run.hs

This is the file that ties it all together, it takes a Perl 6 file, slurps the string out of it, hands it to the parser, then takes the AST out and sends its envBody into the evaluator.

A PROGRAM'S LIFE CYCLE IN DETAIL

Earlier we discussed how eventually what the parser emits is fed to the evaluator. Now we'll look at the details and special cases more closely.

As we've seen before, the runtime calls the parser on the Perl code, and it, in turn, generates an AST. Most parsed things result in trivial structures -- just a representation of the program in something a bit more manipulable than a string of source code.

This basic structure, a node of the AST, is called an Exp - an expression. It represents the combination of values and operation, and the evaluator knows to boil it down into a Val.

Matters get a little more complex when the code not only is something at compile time, but actually does something, like declarations of variables which create the variables, or BEGIN blocks which execute code at compile time.

Enter unsafe unsafeEvalExp

The parser is pure in that it does not affect the outside world when it does its thing. It constructs the AST, but not much more.

In order to execute things like BEGIN blocks there are exceptions to this.

    BEGIN { print "compile time" };

This operation has side effects - it causes the world outside the pugs interpreter to change. However, it must happen within the "pure" parser, and Haskell does not normally allow these things.

The unsafe in the name denotes that an effort was made to not care about that bit of safety, and do IO in the pure parser anyway.

But it does not strictly mean IO - what unsafeEvalExp is just a short circuit from the parser to the evaluator, allowing code to run at compile time.

BEGIN blocks are evaluated by calling unsafeEvalExp on the resulting Exp immediately after the block finished parsing, and then replacing that point in the syntax tree with a the value the block was reduced to.

Declarations of sorts create a node in the syntax tree called a Syn. Syns represents syntactic constructs of sorts, amongst which are variable declarations. When evaluated, variable declarations create a type of Exp that will modify the Env, adding a new symbol, and then roll back the change later. They are also evaluated immediately using unsafeEvalExp.

Other Syns include control flow structure, and various keywords, but they will be discussed later.

reduce :: Exp -> Eval Val

The heading of this section is the type declaration for the evaluator's reduce function.

Let's break it down.

The Exp means that the single argument reduce accepts is an expression. The Eval is the monadic fudgeting of the Val type, indicating that the reduction process of the Val from the Exp is controlled by the Eval monad.

Lets try to explain this with an example:

    reduce (Val v) = reduceVal v
    reduceVal v = retVal v

This form of reduce takes the expression that is just a value, like 3 or "foo" and encapsulates the data into the Eval monad using retVal.

    reduce (Var name) = reduceVar name
    reduceVar name = do
        v <- findVar name
        case v of
            Just var -> evalRef var
            _ | (':':rest) <- name -> return $ VType (mkType rest)
            _ -> retError "Undeclared variable" name

This reduction takes an expression like $a and reduces it into a value. Here the Eval monad's purpose comes into play a bit more clearly.

The first line finds the container named by name using the findVar function.

The Eval monad is in use because such an operation might fail - in this case, the variable does not exist. The second line throws an exception when that happens.

Lastly, if everything went OK, the container is dereferenced to return the value it contains. Here is the type signature of evalRef, for reference:

    evalRef :: VRef -> Eval Val

Many different reductions

src/Pugs/Eval.hs contains 11 different variants of reduce, which dispatch to over 60 sub-variations. Each one serves a different purpose, and most are pretty straightforward.

For example reduce (Syn "env" []) is the reduction that takes care of variable declaration using VControl, while reduce (Cxt cxt exp) forces the subexpression exp to be evaluated in the context cxt.

Let's look at some of the more interesting reduces. My personal favourite is for.

It is defined in the reduceSyn name exps declaration, which is the reduction of the various syntatic constructs. It uses Haskell's pattern matching to invoke the appropriate reduction variant for the values of name. Here's the case for, annotated:

    reduceSyn "for" [list, body] = do

This takes the two expressions to the for syntax thingy, the list part, and the body. for (@list) { print "i'm the body" }.

The body is actually a subroutine; we'll look at that in a bit. After that line are some details which we don't care about right now. Let's pretend they don't exist and jump down to

    let arity = max 1 $ length (subParams sub)

That part determines how many elements to take out of list for each iteration of body. After that comes a lexically scoped function definition, runBody. Let's analyze it.

    runBody [] _ = retVal undef

This takes care the case where there are no more elements in list. Contrast it with:

    runBody vs sub' = do
        let (these, rest) = arity `splitAt` vs
        genSymCC "&next" $ \symNext -> do
            genSymPrim "&redo" (const $ runBody vs sub') $ \symRedo -> do
                apply (updateSubPad sub' (symRedo . symNext)) Nothing $
                    map (Val . VRef . MkRef) these
        runBody rest sub'

Which matches vs that isn't an empty list (the first runBody matched that case). The splitAt takes arity elements out of vs, that is the number of parameters the body subroutine wants, and puts them in these. The rest go into rest.

The lines after that set the &redo and &next variables so that they contain functions which will control the flow of the current iteration of the loop.

The line starting with apply applies the subroutine currently in sub', and gives it these as its parameters on the line starting with map.

Lastly, after the subroutine is applied, runBody is run again on rest.

    genSymCC "&last" $ \symLast -> do
        let munge sub | subParams sub == [defaultArrayParam] =
                munge sub{ subParams = [defaultScalarParam] }
            munge sub = updateSubPad sub symLast

Outside of runBody's definition, the &last variable is also defined. It controls the whole loop, not only a single step, so it doesn't need to be in &runBody.

When all the auxiliary functions have been defined, we can run the body with the list passed into the for (munging into elms omitted):

    runBody elms $ munge sub

VCode application

Now that we've seen a nice example of how a subroutine (which might be masquerading as a simple block) is used, lets see how VCode, the value representing closures (subroutines, blocks, coroutines, etc) is called.

Subroutine application can be very simple, in the case of a Prim. At other times it involves entering a lexical scope, due to block open. Sometimes parameter binding is involved too.

But have no fear, we will soon see that like most parts of pugs, these things are actually pretty simple.