The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Marpa::API - The Marpa Parse Engine's API: Overview

SYNOPSIS

    my $grammar = Marpa::Grammar->new(
        {   start   => 'Expression',
            actions => 'My_Actions',
            default_action => 'first_arg',
            rules   => [
                { lhs => 'Expression', rhs => [qw/Term/] },
                { lhs => 'Term', rhs => [qw/Factor/] },
                { lhs => 'Factor', rhs => [qw/Number/] },
                { lhs => 'Term', rhs => [qw/Term Add Term/], action => 'do_add' },
                {   lhs    => 'Factor',
                    rhs    => [qw/Factor Multiply Factor/],
                    action => 'do_multiply'
                },
            ],
        }
    );

    $grammar->precompute();

    my $recce = Marpa::Recognizer->new( { grammar => $grammar } );

    my @tokens = (
        [ 'Number', 42 ],
        [ 'Multiply', ],
        [ 'Number', 1 ],
        [ 'Add', ],
        [ 'Number', 7 ],
    );

    $recce->tokens( \@tokens );

    sub My_Actions::do_add {
        my ( undef, $t1, undef, $t2 ) = @_;
        return $t1 + $t2;
    }

    sub My_Actions::do_multiply {
        my ( undef, $t1, undef, $t2 ) = @_;
        return $t1 * $t2;
    }

    sub My_Actions::first_arg { shift; return shift; }

    my $value_ref = $recce->value;
    my $value = $value_ref ? ${$value_ref} : 'No Parse';

DESCRIPTION

This document is the top-level overview of the API for the Marpa parse engine. It is intended as the starting point for reading the API documentation.

This document uses two examples which show the typical flow of calls in the Marpa API. The intent is to provide "the big picture" into which other details will fit. These fuller details are given in other reference documents.

The Three Phases

A parser needs to:

  • Accept a grammar.

  • Read input.

  • Return values from the parses, according to a semantics.

In Marpa these are three distinct phases. The API is structured around these phases.

Creating a Grammar

The package Marpa::Grammar creates grammars. It also "precomputes" the grammars. Precomputation creates a set of data structures that the recognizer will need.

Recognizing Input

The Marpa::Recognizer package creates recognizers from grammars. A recognizer reads input and produces tables based on the grammar and the input. The information that the evaluators use to find parses is in the recognizer's tables.

Evaluating Parses

Marpa's evaluators find the parses in the recognizer's tables. If the input did not match the grammar, the tables will contain no parses, and the evaluators will indicate that. If the input did match the grammar, the tables will contain one or more parses. There may be more than one parse, because Marpa allows ambiguous grammars.

Marpa has two evaluators: a Single Parse Evaluator, and a Multi-parse Evaluator. The Single Parse Evaluator returns the value of only one of the parses. The Multi-parse Evaluator allows the application to iterate parse trees.

For each of its rules, a Marpa application can specify a Perl closure to be the rule's semantic action. For each token, a Marpa application can specify a Perl scalar to be the token's value. The Marpa evaluators apply these semantics recursively to the parse trees. The values which result are returned to the user.

EXAMPLE 1: A SIMPLE CALCULATOR

The synopsis shows the code for a calculator. Very simple, the calculator handles only addition and multiplication of integers. This section explains in detail how it works.

Marpa::Grammar::new

    my $grammar = Marpa::Grammar->new(
        {   start   => 'Expression',
            actions => 'My_Actions',
            default_action => 'first_arg',
            rules   => [
                { lhs => 'Expression', rhs => [qw/Term/] },
                { lhs => 'Term', rhs => [qw/Factor/] },
                { lhs => 'Factor', rhs => [qw/Number/] },
                { lhs => 'Term', rhs => [qw/Term Add Term/], action => 'do_add' },
                {   lhs    => 'Factor',
                    rhs    => [qw/Factor Multiply Factor/],
                    action => 'do_multiply'
                },
            ],
        }
    );

Marpa grammars are Marpa::Grammar objects. They are created with the Marpa::Grammar::new constructor. The arguments to Marpa::Grammar::new are references to hashes of named arguments. In the key/value pairs of these hashes, the name of the argument is the hash key, and the value of the argument is the hash value.

The start Named Argument

    start => 'Expression',

The start named argument is required. The argument value is a string containing the name of the grammar's start symbol.

Named Arguments for the Semantics

            actions => 'My_Actions',
            default_action => 'first_arg',

The actions and default_action named arguments specify semantics. Their argument values are strings, which rely for their interpretation on the evaluators.

The evaluators will be described later. Peeking ahead, actions is the name of a Perl package where Marpa looks for Perl closures to implement semantic actions. default_action is the name of the Perl closure that implements the semantics for rules which do not explicitly specify their own semantics.

The rules Named Argument

    rules => [
        { lhs => 'Expression', rhs => [qw/Term/] },
        { lhs => 'Term',       rhs => [qw/Factor/] },
        { lhs => 'Factor',     rhs => [qw/Number/] },
        { lhs => 'Term', rhs => [qw/Term Add Term/], action => 'do_add' },
        {   lhs    => 'Factor',
            rhs    => [qw/Factor Multiply Factor/],
            action => 'do_multiply'
        },
    ],

The value of the rules named argument is a reference to an array of rule descriptions. In this example, the rule descriptions all use the "long" form -- they are references to hashes of rule properties. In each key/value pair of a rule description hash, the key is the name of a rule property, and the hash value is the value of that rule property.

The lhs Property

The value of the lhs rule property must be a string containing the name of the rule's left hand side symbol. Every Marpa rule must have a left hand side symbol.

The rhs Property

The value of the rhs property is a reference to an array of strings containing names of the rule's right hand symbols, in order. This array may be zero length, in which case this is an empty rule -- a rule with no symbols on the right hand side. A rule is also empty if there is no rhs specifier in its description. There are no empty rules in this example.

The action Property

The value of the action rule property is a string. What the string means is up to the evaluator. Peeking ahead, each action property string will be interpreted as the name of a Perl closure that implements the rule's semantic action.

Marpa::Grammar::precompute

    $grammar->precompute();

Before a Marpa grammar object can be used by a Marpa recognizer, it must be precomputed. Precomputation compiles data structures that the recognizer will need.

Marpa::Recognizer::new

    my $recce = Marpa::Recognizer->new( { grammar => $grammar } );

Marpa::Recognizer::new creates a new recognizer. Its arguments are references to hashes of named arguments. In this example the only named argument is "grammar", which is required. The value of the grammar named argument must be a precomputed Marpa grammar for the recognizer to use in building its tables.

Marpa::Recognizer::tokens

    my @tokens = (
        [ 'Number', 42 ],
        [ 'Multiply', ],
        [ 'Number', 1 ],
        [ 'Add', ],
        [ 'Number', 7 ],
    );

    $recce->tokens( \@tokens );

Marpa::Recognizer::tokens reads the input to be recognized. In the example, its argument is a reference to an array of token descriptions. Each token description is an array reference.

The first element of a token description is a string containing the token name. The token name must be a symbol name in the grammar. (A note to experts: By default, Marpa allows symbols that appear on the left hand side of rules to be terminals as well.)

The second element of a token description is the token value. A token value must be a Perl scalar, but the rest of its semantics is entirely up to the application. If the token value is omitted, it's a Perl undef. In the calculator example, the values of the "Add" and "Multiply" tokens are never used, so they are allowed to default to undef.

Marpa::Recognizer::value

    my $value_ref = $recce->value;
    my $value = $value_ref ? ${$value_ref} : 'No Parse';

Marpa::Recognizer::value method returns a reference to the parse's value, if there was a parse. If there was no parse, Marpa::Recognizer::value returns undef.

The Marpa::Recognizer::value method is Marpa's Single Parse evaluator. Marpa has two evaluators: a Single Parse Evaluator and a Multi-parse Evaluator. The Single Parse Evaluator will handle all unambiguous grammars as well as many ambiguous parses. The Single Parse Evaluator is the best evaluator for most situations.

Resolving the Semantics

The first thing Marpa::Recognizer::value needs to do is resolve the semantics: turn the strings specifying semantics into code references. In this example, the actions named argument is interpreted as a Perl package name, and the actions are sought there.

    actions => 'My_Actions',
    { lhs => 'Factor', rhs => [qw/Factor Multiply Factor/], action => 'do_multiply' },

For example, the action property for the above rule is "do_multiply" and the actions named argument to the grammar was "My_Actions". So Marpa looks for a closure whose fully qualified name is My_Actions::do_multiply which, in this example, it finds.

    sub My_Actions::do_multiply {
        my ( undef, $t1, undef, $t2 ) = @_;
        return $t1 * $t2;
    }

Rules do not always have action properties. That is the case with these rules in this example:

    { lhs => 'Expression', rhs => [qw/Term/] },
    { lhs => 'Term', rhs => [qw/Factor/] },
    { lhs => 'Factor', rhs => [qw/Number/] },

For these rules, the semantics will be provided by the the grammar's default action.

    default_action => 'first_arg',

The default_action named argument is resolved in the same way as are the action properties of the rules. In this example, default_action is specified as "first_arg" and resolves to My_Actions::first_arg.

Semantic Actions

    sub My_Actions::do_add {
        my ( undef, $t1, undef, $t2 ) = @_;
        return $t1 + $t2;
    }
    sub My_Actions::first_arg { shift; return shift; }

The semantic actions are callbacks, called when each node in a parse tree is evaluated. The callbacks receive one or more arguments. The first argument to a semantic action is always a per-instance object, which the callbacks can use as a scratchpad. In these examples, the per-instance object is not used.

For a non-empty rule, the second and any subsequent arguments to the callback are the values of symbols on the right hand side of the rule for the callback node, in lexical order. If the semantic action is for an empty rule, the per-instance object will be its only argument.

Every semantic action is expected to return a value. With one exception, this value is passed up to a parent node as an argument. The exception is the value for the start rule. The return value for the start rule becomes the value of the parse.

The default_action named argument is optional. When there is no default action for a grammar, rules without an explicit semantics return a Perl undef.

THE SINGLE PARSE EVALUATOR

In most cases you can produce the parse's value with Marpa's Single Parse Evaluator. The Single Parse Evaluator is not a separate package. It is implemented as the Marpa::Recognizer::value method.

Marpa::Recognizer's Single Parse Evaluator is far more powerful than the evaluator in most parser generators, but it does have limitations.

  • The Marpa::Recognizer::value method cannot be used if your grammar is infinitely ambiguous. A grammar is infinitely ambiguous if there is some input for which you could, without duplication, list the parses forever.

  • The Marpa::Recognizer::value method cannot be used if your parse is ambigious and you want to see two or more of the possible parses.

  • The Marpa::Recognizer::value method cannot be used if your parse is ambigious and you want to be able to choose which parse you see.

In all other cases, the Marpa::Recognizer::value can be used, will be faster, and will usually be the best choice.

THE MULTI-PARSE EVALUATOR

The Multi-parse Evaluator is for those situations when you can't use the Marpa's Single Parse Evaluator, The Multi-parse Evaluator handles all inputs to all grammars. For unambiguous and finitely ambiguous grammars, the Multi-parse Evaluator will iterate all the parses. The Multi-parse Evaluator also allows the application to control the order in which those parses are produced.

For infinitely ambiguous grammars, the Multi-parse Evaluator will iterate all the non-cyclical parses. Few users will need or want to work with infinitely ambiguous grammars. Those interested in more on this topic can look at the documentation for Marpa::Evaluator.

Finitely ambiguous grammars are a different matter. These can be very useful. They haven't been much used in the past, because few parsers were capable of handling them.

The Single Parse Evaluator will evaluate parses with finitely ambiguous grammars. But the Single Parse Evaluator is limited to returning one parse, chosen arbitrarily. That is sufficient for a lot of work with ambiguous grammars. When it's not sufficient, the Multi-parse Evaluator is the right tool.

EXAMPLE 2: AN AMBIGUOUS PARSE

This is the same calculator as before, rewritten to be ambiguous. Rather than give multiplication precedence over addition, it allows any order of operations. In this example, the semantic actions (My_Actions::do_add, etc.) and the @tokens array remain the same as before.

Eliminating precedence makes the grammar shorter, but it also means there can be multiple results with different values. In this application, we decide to return all possible parses. The Single Parse Evaluator will not do this, so we use the Multi-parse Evaluator.

    my $ambiguous_grammar = Marpa::Grammar->new(
        {   start   => 'E',
            actions => 'My_Actions',
            rules   => [
                [ 'E', [qw/E Add E/],      'do_add' ],
                [ 'E', [qw/E Multiply E/], 'do_multiply' ],
                [ 'E', [qw/Number/],       ],
            ],
            default_action => 'first_arg',
        }
    );

    $ambiguous_grammar->precompute();

    my $ambiguous_recce =
        Marpa::Recognizer->new( { grammar => $ambiguous_grammar } );

    $ambiguous_recce->tokens( \@tokens );

    my $evaler = Marpa::Evaluator->new( { recce => $ambiguous_recce } );

    my @values = ();
    if ($evaler) {
        while ( defined( my $ambiguous_value_ref = $evaler->value() ) ) {
            push @values, ${$ambiguous_value_ref};
        }
    }

Short Form Rule Descriptions

    rules => [
        [ 'E', [qw/E Add E/],      'do_add' ],
        [ 'E', [qw/E Multiply E/], 'do_multiply' ],
        [ 'E', [qw/Number/], ],
    ],

The rule descriptions in the ambiguous example demonstrate the "short" or array form of rule descriptions. Array form rule descriptions are references to arrays. Here the elements are, in order, the lhs property, the rhs property, and the action property.

Marpa::Evaluator::new

    my $evaler = Marpa::Evaluator->new( { recce => $ambiguous_recce } );

Multi-parse evaluators are objects in the Marpa::Evaluator class. They are created with the Marpa::Evaluator::new method. The arguments to this constructor are references to hashes of named arguments. In this example, the only named argument is "recce". The recce named argument is required. It must be the name of the recognizer in whose tables the evaluator will find its parses.

It is possible that Marpa::Evaluator::new will be able to see immediately that there are no parses in the recognizer's tables. In that case Marpa::Evaluator::new returns a Perl undef.

Marpa::Evaluator::value

    my @values = ();
    if ($evaler) {
        while ( defined( my $ambiguous_value_ref = $evaler->value() ) ) {
            push @values, ${$ambiguous_value_ref};
        }
    }

The Marpa::Evaluator::value method iterates through the parses. It returns a reference to the value of the parse. At the end of the iteration, after all parses have been returned, Marpa::Evaluator::value returns undef. If there were no parses, Marpa::Evaluator::value returns undef the first time that it is called.

ERRORS AND EXCEPTIONS

Methods in the Marpa API do not return errors. When there are errors, Marpa API methods throw an exception.

INHERITANCE

Classes in the Marpa API are not designed to be inherited.

LICENSE AND COPYRIGHT

Copyright 2007-2010 Jeffrey Kegler, all rights reserved. Marpa is free software under the Perl license. For details see the LICENSE file in the Marpa distribution.