The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Marpa::Deprecated::Evaluator - a DEPRECATED Evaluator

THIS MODULE IS DEPRECATED

This module has been replaced by methods of the Marpa Recognizer. This module is no longer supported and will be removed in a future release. In particular, there will never be XS support for this module.

History

This module was the 2nd Generation Marpa Evaluator. It is now strongly deprecated in favor of the 3rd Generation Evaluator, which is part of the Marpa Recognizer.

The 3rd Generation is faster, more powerful and less complicated, both internally and externally. The 3rd Generation Evaluator benefits from the design lessons learned from the previous two generations.

Marpa is alpha software, and at this point is experimental, so hopefully no major or production applications depended on the 2nd Generation interface.

HOW TO CONVERT TO THE 3rd GENERATION EVALUATOR

Conversion of existing code should be simple. In the 3rd Generation parser, no Evaluator objects should be created. Instead of the Evaluator object's value method, the Marpa::Recognizer::value method should be used. Its interface is essentially the same.

In the 2nd Generation Evaluator, multiple evaluations based on the same Marpa::Recognizer object could be performed by creating multiple Marpa::Evaluator objects. In the 3rd Generation, this is done by using the Marpa::Recognizer::reset_evaluation method to reset the evaluation data in the Marpa Recognizer object.

Ranking Parses

In the 3rd Generation Evaluator, the ranking_method named argument corresponds to the parse_order named argument of the 2nd Generation Evaluator.

During the life of the 2nd Generation Evaluator, the default parse order varied. In the 3rd Generation Evaluator, it is "none". This means that the ranking_method named argument of the Marpa Recognizer must be set to a value other than "none", or all ranking actions will be silently ignored.

Aside from "none", the 2nd Generation Evaluator only had one documented parse order: "numeric". This name was somewhat misleading, and the ranking method which provides the same capabilities in the 3rd Generation Evaluator is called "constant".

In converting from the 2nd Generation Evaluator to the 3rd, the return values of the ranking actions will usually have to change. In the 2nd Generation, these returned a rank. In the 3rd Generation, these return a reference to a rank. This allows 3rd Generation to specify that, instead of being ranked, a choice should be ignored. If a 3rd Generation ranking closure returns a Perl undef, the token or rule it corresponds to will be ignored.

SYNOPSIS

    my $evaler = Marpa::Evaluator->new( { recce => $ambiguous_recce } );

    my @values = ();
    if ($evaler) {
        while ( defined( my $ambiguous_value_ref = $evaler->value() ) ) {
            push @values, ${$ambiguous_value_ref};
        }
    }

DESCRIPTION

Marpa::Evaluator objects implement Marpa's Multi-parse Evaluator. Marpa has another Evaluator, its Single Parse Evaluator, which can handle all unambiguous and many ambiguous parses, and is simpler and faster. If you're not sure which to use, see "THE SINGLE PARSE EVALUATOR" in Marpa.

A Multi-parse Evaluator is created with the Marpa::Evaluator::new method. The user can then iterate through the parse results, using the Marpa::Evaluator::value method.

Parse Order

The Multi-parse Evaluator will produce all the parse results for a given parse, and allows the user to control the parse order. The method used to order parse results is specified with Marpa::Evaluator's "parse_order" named argument. As of this writing, two values are supported for the "parse_order" named argument: none and numeric.

Parses results are returned in an order determined by node rank. The rules are

    Rule 1: In determining the order of parse results, the parse tree is traversed in pre-order, left to right.

    Rule 2: The same parse result is never returned twice. Two parse results are considered "the same" if they apply the same rules in the same order at the same earleme locations.

    Rule 3: At each point of ambiguity, the choice with the highest rank is made, unless that choice would violate Rule 2.

"none"

When the parse order is "none", the node rank is arbitrary and the parse results will be iterated in arbitrary order. This is the most efficient alternative, and is the default.

"numeric"

The "numeric" parse order allows user control of the order in which the value method iterates through the parse results. Numeric parse order assigns a Perl numeric value to each leaf node and rule node of the parse. This numeric value becomes the node's rank. Negative values are allowed. The highest numeric value is the highest rank, the lowest numeric value is the lowest rank, and so forth.

By default,

  • The rank of every leaf node is 0.

  • The rank of every rule node is the sum of the ranks of its child nodes.

The defaults can be changed by assigning ranking actions. The ranking action of a token leaf node is specified using the token symbol's ranking_action property. The ranking action of a nulled leaf node is specified using the null node symbol's ranking_action property. The ranking action of a rule is specified using the rule's ranking_action property.

For the rank of a node to be calculated, the ranking action must first be resolved to a ranking Perl closure. Ranking actions are resolved to ranking Perl closures in the evaluator setup phase, using the same logic that resolves semantics actions to Perl semantic closures. The logic that resolves action names to closures is described in detail in the semantics document.

The ranking closures for token leaf nodes and the ranking closures for nulled leaf nodes are called in the bocage setup phase. The ranking closures for rule nodes are called during the parse setup phase.

Infinitely Ambiguous Grammars

The fast evaluator does not allow an infinitely ambigious grammar, but the Multi-parse Evaluator does. An example of an infinitely ambiguous grammar is the following:

    S ::= A
    A ::= B
    B ::= A
    B :: 'x'

Given the input 'x', this grammar will produce these parses

    S -> A -> B -> x
    S -> A -> B -> A -> B -> x
    S -> A -> B -> A -> B -> x -> A -> B -> x
    .
    .
    .

Because of the two rules A ::= B and B ::= A, this list of parses could go on forever. The two rules A ::= B and B ::= A form what is called a cycle.

Typically, if a user has written an grammar with an infinite cycle, it was a mistake and he wants to rewrite it before proceeding. By default, an infinitely ambiguous grammar is a fatal error. This is the behavior most users will want.

If a user does want to parse with an infinitely ambiguous grammar, Marpa allows it. The details are described below.

CONSTRUCTOR

new

    my $evaler = Marpa::Evaluator->new( { recce => $ambiguous_recce } );

The new method's arguments are one or more hashes of named arguments. On success, the new method returns a new evaluator object. If the new method finds that there is no parse of the input according to the grammar, it returns undefined. For other failures, the new method throws an exception.

A unsuccessful return from the new method means that there is no parse of the input according to the grammar, but the opposite is not necessarily true. It is possible that the new method will successfully return an evaluator object, only for the value method to return undefined the first time it is called, indicating that there are no parse results. The number of parse results, including whether there are any at all, is not known until they are iterated by the value method.

The recognizer named argument is required, and its value must be a recognizer object which has finished recognizing a text. The recce named argument is a synonym for the recognizer named argument. Other named arguments are described below.

MUTATORS

set

    $evaler->set( { max_parses => 300 } );

The set allows Marpa named arguments to be specified for an evaluator object after its creation. The arguments to the set method are zero or more hashes of named arguments.

value

    my @values = ();
    if ($evaler) {
        while ( defined( my $ambiguous_value_ref = $evaler->value() ) ) {
            push @values, ${$ambiguous_value_ref};
        }
    }

Iterates the evaluator object, returning a reference to the value of the next parse result. A Perl undef is a valid parse result value, and is returned as a reference to an undef.

If there are no more parse results, returns undefined. If the input did not match the grammar, the value method will return undefined the first time it is called. Failures are thrown as exceptions.

TRACE ACCESSORS

show_bocage

    my $show_bocage_output = $evaler->show_bocage(2);

Returns a multi-line string describing the bocage for an evaluator. The first line contains the count of the parse results produced up to this point from the bocage. The bocage follows in the form of a parse bocage grammar, listed in pre-order.

Parse bocage grammars are similar to parse forest grammars. Parse bocage grammars are described at length in a separate document, using an example output from show_bocage.

The optional verbosity argument must be an integer greater than or equal to zero. For each parse bocage and-production, if verbosity is a number greater than zero, Marpa shows the LR(0) item corresponding to the and-production's and-node.

If verbosity is less than 2, Marpa shows only the and-productions. If verbosity is 2 or more, Marpa also shows the or-productions. Or-productions contain only redundant information, but the parse bocage grammar is not formally complete without them.

CONTEXT-AWARE STATIC METHODS

    sub rank_null_a {
        return ( $MyTest::MAXIMAL ? -1 : 1 )
            * 10**( 3 - Marpa::token_location() );
    }

The ranking Perl closures that run at node evaluation time have available a set of context-aware static methods. A closure can use these methods to learn about the context in which it is called. As of this writing, the context-aware static methods are available to the ranking Perl closures only -- the semantic Perl closures cannot use them. This is an arbitrary restriction, and may be lifted in the future.

Marpa::location

Returns the earleme location of the origin (or start) of the rule.

Marpa::token_location

Returns the earleme location of the token. Intended for use in callbacks associated with empty rules and nulling symbols.

NAMED ARGUMENTS

closure

The Multi-parse Evaluator's closure named argument is a reference to a hash. In the key/value pairs of this hash, the key must be an action name. The hash value must be a CODE ref.

Sources of action names include

  • The action properties of rules

  • The default_action named argument of grammars

  • The lhs properties of rules

  • The ranking_action properties of rules

  • The ranking_action properties of symbols

  • For its new method, the action_object named argument of grammars

When an action name is a key in the closure named argument, the usual action resolution mechanism of the semantics is bypassed. A common use of the closure named argument is to allow anonymous subroutines to be semantic actions. For more details, see the document on semantics.

end

The Multi-parse Evaluator's end named argument specifies the parse end location. The default is for the parse to end where the input did, so that the parse returned is of the entire input.

infinite_rewrite

Marpa handles infinitely ambiguous grammars, and allows two strategies for dealing with them. If the infinite_rewrite named argument is true, when Marpa detects an infinitely ambiguous grammar, Marpa will rewrite the parse bocage to eliminate infinite cycles. The rewritten parse bocage will be semantically equivalent to the original. This rewrite happens in the evaluator setup phase, during the call to Marpa::Evaluator::new. Rewriting to eliminate infinite cycles is the default behavior.

If the infinite_rewrite named argument is false, there will be no rewrite to eliminate infinite cycles. Instead, infinite cycles will be dealt with dynamically, as they arise in the parse setup phase during the calls to Marpa::Evaluator::value. For more detail, see the section on infinitely ambiguous grammars.

max_parses

The value must be an integer. If it is greater than zero, the evaluator will return no more than that number of parse results. If it is zero, there will be no limit on the number of parse results returned. The default is for there to be no limit.

Marpa allows extremely ambiguous grammars. The user may write a highly ambiguous grammar by mistake, or as a deliberate choice of parsing strategy for an application. When the ambiguity is a mistake, specifying max_parses is useful to limit CPU usage and output length while the issue is being debugged. When ambiguity is a deliberate strategy, it is often the case that the user only wants to see the first few parse results.

parse_order

The value must be a string: either "none" or "numeric". When the value is "none", Marpa returns the parse results in arbitrary order. When the value is "numeric", Marpa allows the user to control the order in which parse results are returned by specifying ranking actions. The default is for parse results to be returned in arbitrary order. For details, see the section on parse order.

recce

A named argument whose value is the recognizer that the Evaluator is to be created from. This named argument is required for the Marpa::Evaluator::new call. The recce named argument is not allowed for the Marpa::Evaluator::set call.

recognizer

A synonym for the "recce" named argument.

trace_actions

The Multi-parse Evaluator's trace_actions named argument is a boolean. If the boolean value is true, Marpa traces the resolution of action names to Perl closures. A boolean value of false turns tracing off, which is the default. Traces are written to the trace file handle.

trace_evaluation

The Multi-parse Evaluator's trace_actions named argument is a boolean. If the boolean value is true, Marpa prints messages that trace the rewriting of the parse bocage in the evaluator setup phase. A boolean value of false turns tracing off, which is the default. Traces are written to the trace file handle.

trace_file_handle

The value is a file handle. Traces and warning messages go to the trace file handle. By default the trace file handle is inherited from the recognizer used to create the evaluator.

trace_values

The Multi-parse Evaluator's trace_values named argument is a numeric trace level. If the numeric trace level is 1, Marpa traces values as they are computed in the evaluation stack. A trace level of 0 turns value tracing off, which is the default. Traces are written to the trace file handle.

INFINITELY AMBIGUOUS GRAMMARS: THE DETAILS

This section describes the details of parsing with infinitely ambiguous grammars. Most readers will want to skip this section.

Experts in parsing theory will notice that my terminology is non-standard. What I am calling a potentially infinite cycle or an infinite cycle is usually called just a cycle in the literature, because all cycles allow a potentially infinite number of parse results from a single input. I use the more verbose terms because most readers will not be familiar with parsing theory, and might confuse cycles with ordinary recursion. Cycles result in infinite ambiguity, and to date have not found use in grammars which are of practical interest. Ordinary non-cyclical recursions, on the other hand, occur frequently in grammars of practical interest, do not result in infinite ambiguity, and are handled by Marpa without the need to take special measures.

By default, Marpa treats an infinitely ambigious grammar as a fatal error. This is because infinite ambiguity is usually a mistake, and one that is easily fixed. To proceed to producing parse results from an infinitely ambiguous grammar, the user must set the grammar's infinite action named argument to a value other than "fatal". The other choices are "warn" and "quiet".

Cycle Length

Obviously, Marpa cannot evaluate an infinitely long parse, and Marpa cannot list all of an infinite number of parse results. For grammars with potentially infinite cycles, Marpa's Multi-parse Evaluator returns only those parse results where the cycle length is 1. There will always be a finite number of these parse results.

Cycle length is the number of times a parse derivation goes around a potentially infinite cycle. Above, a set of parses from an example of an infinitely ambiguous grammar was shown. Here are those parses again, this time labeled with their cycle length.

    Cycle length 1: S -> A -> B -> x
    Cycle length 2: S -> A -> B -> A -> B -> x
    Cycle length 3: S -> A -> B -> A -> B -> x -> A -> B -> x

Of the parse results in the above list, Marpa would return a value only for the first, the one whose cycle length is 1.

When iterating parse results for an infinitely ambiguous grammar, two strategies are available to the user. Which is used depends on the setting of the infinite_rewrite named argument. If the infinite_rewrite named argument is true, the rewrite strategy is used. If the infinite_rewrite named argument is false, the dynamic strategy is used. The rewrite strategy is the default.

The Rewrite Strategy

In the rewrite strategy, the parse bocage is rewritten during the evaluator setup phase, in the Marpa::Evaluator::new method. Potentially infinite cycles are eliminated in the rewrite, but all derivation paths for cycles of length one are preserved. This is the most powerful method and it works extremely well when the parse only contains a limited number of infinite cycles.

A disadvantage of the rewrite strategy is that the rewrite is done, and its full cost in CPU time incurred, before any parses are returned. Marpa typically tries to amortize the cost of ambiguity over the parse results.

Another disadvantage of the rewrite strategy is that there are grammars for which the rewrite is so CPU intensive that it is simply impractical for a parse of any size. These cases are probably only theoretical, but they are not very hard to create.

The extreme worst cases are plex grammars. In a plex grammar, every symbol directly produces every other symbol. That is, in a plex grammar, for every ordered pair of symbols X and Y, there is a rule of the form

     X ::= Y

I cannot think that anyone will be using plex grammars in a practical application. On the other hand, the same thing might be said of infinite ambiguity itself.

The Dynamic Strategy

The rewriting of infinite cycles can be turned off by setting the Multi-parse Evaluator's infinite_rewrite named argument to false. This totally eliminates the cost of the infinite cycle rewrite. The disadvantage of the dynamic strategy is that, in Marpa's current implementation, it skips some parse results which the rewrite strategy includes.

The rewrite strategy includes all the parse results that are possible when cycle length is limited to 1. Where there are multiple, connected cycles, the rewrite strategy will iterate every possible way of reaching every possible cycle.

When multiple cycles interconnect, many parse results will simply be different paths through the tangle of cycles. The dynamic strategy prunes some of these paths. If an application does not need a precise enumeration of all possible ways of reaching every cycle at every location, this pruning may be acceptable or even preferred.

LICENSE AND COPYRIGHT

Copyright 2007-2010 Jeffrey Kegler, all rights reserved. Marpa is free software under the Perl license. For details see the LICENSE file in the Marpa distribution.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 258:

L<> starts or ends with whitespace