The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

Name

Marpa::R2::BNF - BNF Interface

Synopsis

    my $grammar = Marpa::R2::Grammar->new(
        {   start          => 'Script',
            actions        => 'My_Actions',
            default_action => 'do_first_arg',
            rules          => <<'END_OF_RULES',
    Script ::= Expression+ separator => <op_comma> action => do_script
    Expression ::=
        Number
        | op_lparen Expression op_rparen action => do_parens assoc => group
       || Expression op_pow Expression action => do_pow assoc => right
       || Expression op_times Expression action => do_multiply
        | Expression op_divide Expression action => do_divide
       || Expression op_add Expression action => do_add
        | Expression op_subtract Expression action => do_subtract
    END_OF_RULES
        }
    );

This interface is beta

This interface is beta. It is new, somewhat experimental, and therefore subject to change.

Overview

This page is the reference for the Marpa's "BNF interface", also called its "Stuifzand interface". The Stuifzand interface allows strings in place of the rule descriptors as value of the rules named arguments of Marpa grammars. These strings are called BNF source strings or Stuifzand source strings.

The structure of the BNF source file

Whitespace separates tokens, but is otherwise ignored. Comments are equivalent to whitespace. A hash ("#") character starts a comment, which continues to the end of the line.

The Stuifzand format consists of production declarations. Each production declaraton consists of a left hand side symbol, followed by the declaration operator ("::=") and a right hand side. Right hand sides are described below.

Symbol names

Symbols names must consist entirely of Perl word characters (alphanumerics, plus the underscore). Optionally, symbol names can be enclosed in angle brackets.

Right hand sides

A right hand side may be empty, quantified, or prioritized. A empty right hand side is what the name suggests, no symbols, modifiers or punctuation. A production with an empry right hand side defines an nulling rule.

Quantified right hand sides

A quantified right hand side defines a sequence rule. It consists of a single symbol, followed by a "quantifier", and then followed, optionally, by modifiers. The quantifer is either a star ("*"), or a plus sign ("+") indicating, respectively, that the sequence rule has a minimum length of 0 or 1.

Prioritized right hand sides

All productions which are not empty or quantified, are considered prioritized, although the prioritization may be trivial, consisting only of a single right hand side alternative, at a single priority level. Each prioritized right hand side consists of one or more right hand alternatives ("alternatives" for brevity). An alternative is a series of one or more symbols, followed by zero or more modifiers.

The alternatives are separated by single or double "or" symbols. This means the alternatives in a prioritized right hand side proceed from tightest (highest) priority to loosest. The double "or" symbol ("||") is the "loosen" operator -- the alternatives after it have a looser (lower) priority than the alternatives before it. The single "or" symbol ("|") is the ordinary "alternative" operator -- alternatives on each side of it have the same priority. Associativity is specified using modifiers, as described below.

Modifiers

Modifiers are allowed for quantified productions, and for the right hand side alternatives in prioritized productions. Modifiers must occur after the symbols. Modifiers consist of a keyword, the modifier operator ("=>"), and the modifier's value. The keyword must be from the following list of reserved names. Values are as described for each keyword.

action

The value of the action keyword must be a Perl word (a string of alphanumerics or underscores). It will be interpreted as the action name, as described for the action named argument of rule descriptors.

assoc

The assoc modifier is only valid in a prioritized production. Its value must be one of left, right or group. Its effect will be as described below.

separator

The separator keyword is only valid for a quantified right side, and its value must be a symbol of the grammar. It will be used as the separator, as described for sequence rules.

How Marpa precedence works

Marpa's precedence is a generalization beyond the traditional ideas of precedence. Traditional precedence parsing required the the classification of operators as postfix, infix, etc. Marpa's precedence parsing is NOT based on the special treatment of operators. Marpa's precedence parsing, in fact, has no concept or definition of operator, except as something that is not an operand.

An occurrence of the left hand side in a right hand side alternative is called an operand. Alternatives are assigned an arity according to the number of operands they contain. All arities are allowed, from zero to an arbitrary number imposed by system limits.

An alternative with arity 0 is nullary. Precedence and associativity are meaningless in this case and will be ignored.

An alternative with arity 1 is unary. Precedence will have effect, but left and right associativity will not.

An alternative with arity 2 is binary. Precedence will have effect, and left and right associativity will behave in the traditional way, The traditional behavior for binary alternatives is exactly as described next for the N-ary case.

An alternative with an arity of N, where N is 2 or greater, is N-ary. Precedence will have effect. For left associativity, only the leftmost operand of an N-ary alternative associates -- operands after the first will have the next-tightest priority level. For right associativity, only the rightmost operand of an N-ary alternative associates -- all operands except the last will have the next-tightest priority level.

Marpa also allows "group" associativity. In "group" associativity, all operands associate at the loosest (lowest) priority. That is, in an alternative with group associativity, each operand may be a full expression of the kind defined by the prioritized production. "Group" associativity is used, for example, in implementing the traditional function of parentheses in Marpa. Group associativity is meaningless for nullary alternatives, and is ignored.

Caveats

Symbol names

The BNF interface may rewrite its symbol names internally. Here we call the name by which a symbol is specified in the BNF interface string its "external symbol name", and its name as rewritten, its "internal symbol name". Marpa::R2 usually makes sure the user sees only external symbol names. For example, the rule() Grammar method returns external symbol names.

However, for some purposes, such as tracing grammars, the use of internal names may sometimes be appropriate. Marpa::R2's use of external versus internal symbol name is evolving, and this is one reason the BNF interface is still beta.

Precedence: the good news

Marpa's generalization of precedence works for all grammars that can be defined by prioritized productions. It is efficient (linear) for all grammars that could be parsed by the traditional precedence parsing methods. Marpa also allows you to defined alternatives not allowed by traditional methods. Many of these are useful, and most of those can also be parsed efficiently.

Precedence and ambiguous grammars

Because of the many forms of recursion allowed, it is possible to define highly ambiguous grammars using the precedence mechanism. This can occur even by accident.

The user should especially be careful with right hand side alternatives in which all the symbols are operands. These can be useful. For example, an implicit operation can be defined using an binary alternative with no non-operands, and this could implement, for example, the standard notation for concatenation or multiplication. But to do this efficiently requires either avoiding ambiguity, or controlling its use carefully.

Marpa does catch the case where an alternative consists only of a single operand -- a "unit rule". This causes a fatal error. Unit rules are easy to define by accident in the Stuifzand interface. The author knows of no practical use for them, and their presence in a grammar is usually unintentional. Note that, in the event an application does find a use for such rules, Marpa can parse grammar which includes unit rules, via its other interfaces.

Copyright and License

  Copyright 2012 Jeffrey Kegler
  This file is part of Marpa::R2.  Marpa::R2 is free software: you can
  redistribute it and/or modify it under the terms of the GNU Lesser
  General Public License as published by the Free Software Foundation,
  either version 3 of the License, or (at your option) any later version.

  Marpa::R2 is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
  Lesser General Public License for more details.

  You should have received a copy of the GNU Lesser
  General Public License along with Marpa::R2.  If not, see
  http://www.gnu.org/licenses/.