The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Marpa::R2::Semantics::Order - How Marpa ranks ambiguous parses

DESCRIPTION

Marpa allows ambiguous parses. While an unambiguous parse can produce at most one parse tree and one parse result, an ambiguous parse will produce a parse series. A parse series is a sequence of parse trees, each of which will have its own parse result.

This document describes ways of controlling the order in which the recognizer's value method evaluates the parse trees of an ambiguous parse. It also describes ways to exclude selected parse trees from the parse series.

Duplicate parses are eliminated

When evaluating the parse trees in a parse series, Marpa never evaluates the same parse tree twice. Marpa considers two parse trees to be the same if they are semantic duplicates.

Two parse trees are semantic duplicates if and only if a recursive, top-down evaluation of each applies the same rules in the same order at the same earleme locations. This definition implies that, given any deterministic semantics, two parse trees which are semantic duplicates will always produce the same parse result -- hence the term "semantic duplicates". When the Marpa documentation refers to duplicate parses, it will mean semantic duplicates unless otherwise stated.

Default parse order

By calling the recognizer's value method repeatedly, Marpa can produce all the parse results in the current parse series. The default is for the parse results to be returned in an arbitrary order. This corresponds to the "none" value of the recognizer's ranking_method named argument.

Arbitrary parse order

When this document calls a behavior "arbitrary" it means that an application must not rely on that behavior being repeated. An arbitrary behavior is allowed to change from version to version of the software, from run to run of the application, and even from call to call of the methods.

"Arbitrary parse order", in particular, means that the particular order in which parse trees are evaluated may not be repeated. But an application can expect, even when the order of evaluation of the parse trees is arbitrary, that all appropriate parse trees will be included in the parse series, and that none of the parse trees in the parse series will be the semantic duplicate of any other tree.

Ranking methods

Marpa grammar objects have a ranking_method named argument, whose value can be the name of a ranking method, or "none", indicating that the default ranking method is to be used.

The rule ranking method

The rule method ranks alternative parses according to their rules. Every rule has a rule rank. A rule's rank can be specified using the the rank named argument for that rule. Rule ranks must be integers. If no rule rank is specified, the rule rank is 0. When being ranked, parse trees are traversed top-down, left-to-right.

The high_rule_only ranking method

The high_rule_only ranking method is similar to the rule ranking method, except that, at every choice point, it discards all of the choices which have a rank lower than that of the highest ranked alternative.

In carrying out the ranking, the choice points of the parse trees are visited in top-down, left-to-right order. Since the high_rule_only ranking method eliminates some parse trees, it can reduce or eliminate the ambiguity of a parse, but it does not necessarily do either. This is because, at each choice point among the parse trees, it is possible that several of the choices, or all of them, will have the same rank as the highest ranked alternative.

Null ranking

Some rules have a RHS which contains proper nullables: symbols which may be nulled, but which are not nulling symbols. (Nulling symbols are symbols which are always nulled.)

When a rule contains proper nullables, each application of that rule to a section of input creates a nulling variant -- that rule with a specific pattern of null and non-null symbols. In many cases, this creates an ambiguity -- different nulling variants could apply to the same substring of input. In ambiguous parsings of this kind, some applications may want to rank nulling variants that start with non-null symbols higher. Other applications may want to do the opposite -- to rank nulling variants that start with null symbols higher.

The null_ranking named argument for rules specifies which nulling variants are ranked high or low. Ranking of nulling variants is done left-to-right, with the null preference as indicated by the null_ranking named argument. Specifically, if the null_ranking is "low", then the closer a nulling variant places its non-null symbols to the start of the rule, the higher it ranks. If the null_ranking is "high", then the closer a nulling variant places its null symbols to the start of the rule, the higher it ranks.

A rule is null ranked if and only if its null_ranking is defined. Note that this definition means that a rule can be considered null ranked even if it does not have any nullable symbols. A rule which is not null ranked is called non-null-ranked.

Null ranking, in the current implementation, is targeted at situations where the alternatives are different applications of a single rule. In situations where the choice is between two different rules, an application which cares about the order will typically want to override the null ranking using rule ranks. A more detailed description of the rule comparison logic can be found below.

By default, null_ranking is undefined, which means the order of the null variants is arbitrary. This is fine for most applications.

Details of ranking

When ranking, the logic descends the parse trees top-down and left-to-right. Places where there is more than one alternative are called choice points. Alternatives are ranked (in the rule ranking method) or selected (in the high_rule_only ranking method), by comparing them as follows:

  • If two alternatives are for the same rule, and that the rule is null ranked, they rank as described under "Null ranking".

  • If the two alternatives have the same rule, and that rule is non-null-ranked, the alternatives have equal rank.

  • If the two alternatives have different rules, and the two rules have different rule ranks, the alternative with the higher rule rank ranks high.

  • The remaining cases deal with the comparison of alternatives which have different rules, when both rules have the same rule rank.

    • If the rule for one alternative is null ranked, while the rule for the other is non-null-ranked, the alternative with the non-null-ranked rule ranks high.

    • If both rules are non-null-ranked, the two alternatives have equal rank.

    • If both rules are null-ranked, their ranking is arbitrary -- either one may rank high, or they may have equal rank.

A general approach to sorting parses

The most general way to sort Marpa parses is for the application to take control. The application can set up the Marpa semantic actions so that the parse result of every parse tree is a <rank, true_value> duple. The duples can then be sorted by rank. Once the resuls are sorted, the rank element of the duple can be discarded. (Those familiar with the Schwartzian transform may note a resemblance. In Perl, duples can be implemented as references to arrays of 2 elements.)

The user needs to be careful. In theory, ambiguity can cause an exponential explosion in the number of results. In practice, ambiguity tends to get out of hand very easily. Producing and sorting all the parses can take a very long time.

The constant ranking method is no longer supported

The constant ranking method is no longer implemented. An initial attempt to find the ideal compromise between flexibility and efficiency, the constant ranking method proved too ambitious, and had several drawbacks. First, its overhead was high in comparison to the flexibility it offered. Second, it was not intuitive. Even when the constant ranking method was powerful enough to solve a problem, it was not always obvious how to actually use it. Finally, its implementation was extremely complex.

The new rule ranking method is far more efficient, vastly more intuitive, and far easier to implement and maintain. But the new rule ranking method is also very flexible -- it passes the test suite which was written for the constant ranking method.

COPYRIGHT AND LICENSE

  Copyright 2012 Jeffrey Kegler
  This file is part of Marpa::R2.  Marpa::R2 is free software: you can
  redistribute it and/or modify it under the terms of the GNU Lesser
  General Public License as published by the Free Software Foundation,
  either version 3 of the License, or (at your option) any later version.
  
  Marpa::R2 is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
  Lesser General Public License for more details.
  
  You should have received a copy of the GNU Lesser
  General Public License along with Marpa::R2.  If not, see
  http://www.gnu.org/licenses/.