The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

IMCC - parsing

VERSION

0.1 intital
0.2 lexicals

OVERVIEW

This document describes the basic parsing functionality of imcc.

DESCRIPTION

Imcc parses and generates code in terms of compilation units. These are self contained blocks of code very similar to subroutines.

Code for a compilation unit is created as soon (or not earlier) as the end of the unit is reached.

General imcc syntax

        program: statements ...

where a statement is a simple statement like if ... or a compilation unit containing statements. This allows e.g. nested subs.

Compilation units

Subroutines .sub ... .end

        .sub _name
                statements
                ...
        .end

defines a subroutine with the entry point _name. Subroutine entry points (as all global labels) have to start with an underscore. The statements may contain valid PIR or PASM statements.

Assembly blocks .emit ... .eom

        .emit
        _sub1:
                pasm_statements
                ...
                ret
        ...
        .eom

defines a compilation unit containing PASM statements only. Typical usage is for language initialization and builtins code.

Code outside compilation units

        stmt1
        .sub _main
           stmt2
           ret
        .end
        stmt3

This generates the following PASM equivalent:

        _main:
                stmt2
                ret

                stmt1
                stmt3

which is basically a sequence of unreachable code after the ret. To really use code outside compilation units, the first statement should have a global label.

        _outside:
            stmt1
        .sub _main
            stmt2
            call _outside
            ret
        .end
            stmt3
            ret

This generates the following PASM equivalent:

        _main:
                stmt2
                bsr _outside
                ret
        _outside:
                stmt1
                stmt3
                ret

Nested subs

As code is produced as soon as a compilation unit is closed, the code for nested subroutines appears before the outer subroutine:

        .sub _outer
            stmt1
            .sub _inner
                stmt2
                ret
            .end
            call _inner
            ret
        .end

generates code like this:

        _inner:
            stmt2
            ret
        _ounter:
            stmt1
            bsr _inner
            ret

Symbols, constants and labels

Compilation units maintain their own symbol table containing local labels and variable symbols. This symbol table hash is not visible to code in different units.

Lexicals and named constants declared in an outer scope are visible and used, when not overridden by a .local or .const directive with the same name. S. t/syn/scope.t for examples for this.

Global labels and constants are kept in the global symbol table ghash, which is the symbol table of the outmost compilation unit.

This allows for global constant folding beyond subroutine scope.

Local labels in different compilation units with the same name are allowed, though running the generated PASM through assemble.pl doesn't work. Running this code inside imcc is ok. This will probably change so that local labels are mangled to be uniq.

FILES

imcc.y, instructions.c, t/syn/sub.t, t/imcpasm/sub.t, t/syn/scope.t

AUTHOR

Leopold Toetsch <lt@toetsch.at>