The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Marpa::Doc::Debugging - How to Debug Your Grammar

DESCRIPTION

This document describes techniques for debugging Marpa parses and grammars. It also lists those Marpa methods and Marpa options whose main use is in tracing and debugging.

Basic Debugging Techniques

If parsing failed before or at the end of input, first look at the place in the input where parsing was exhausted. That, along with inspection of the input and the grammar, is often enough to spot the problem. But, typically, you'll have already tried that before consulting this document. Next, you should make sure that Marpa's warnings option is turned on. It's on by default, so you probably have already done that, too.

You should also turn off Marpa's strip option, which is on by default. When the strip option is on, Marpa "strips" its objects of data that is not needed for subsequent processing. This saves time and memory, but data that is not needed for processing can be extremely valuable for debugging. When the strip option is left on, many of Marpa's debugging methods will return partial information or no information at all.

When a problem is not obvious, the first thing I do is turn on the trace_lex option. This tells you which tokens the lexer is looking for and which ones it thinks it found. If the problem is in lexing, trace_lex tells you the whole story. Even if the problem is in the grammar, which tokens the lexer is looking for is a clue to what the recognizer is doing. That is because Marpa uses predictive lexing and only looks for tokens that could result in a successful parse.

It sometimes helps to look carefully at the output of show_rules and show_symbols, to check if anything there is clearly not right or not what you expected.

Advanced Techniques

Next, depending on where in the process you're having problems, you might want to turn on some of the more helpful traces. trace_actions will show you the actions as they are being finalized. trace_iterations traces the initialization and iteration of the parse. trace_values traces the values of the nodes as they are pushed on, and popped off, the evaluation stack.

For a complete investigation of a parse, do the following:

  • Turn off the strip Marpa option. By default, it is on.

  • Make sure the warnings option is turned on. It is on by default.

  • Run show_symbols on the precomputed grammar.

  • Run show_rules on the precomputed grammar.

  • Run show_QDFA on the precomputed grammar.

  • Turn on trace_lex before input.

  • Run show_earley_sets on the recognizer.

  • Run show_bocage in verbose mode on the evaluator after it is created.

    on the evaluator after each call of the value method.

  • Turn on trace_journal.

  • Turn on trace_values so you can see the values as they pushed onto the evaluation stack, popped off it, calculated and pushed back on.

Note that when the input text to the grammar is of any length, the outputs from show_earley_sets, show_bocage, trace_lex, and trace_journal and trace_values can be lengthy. You'll want to work with short inputs if at all possible. The internals document has example outputs from the show_QDFA, show_earley_sets, and show_bocage methods, and explains how to read them.

OPTIONS

These are Marpa options. Unless otherwise stated, the Marpa options are valid for all methods which accept Marpa options as named arguments ( Marpa::mdl, Marpa::Grammar::new, Marpa::Grammar::set, Marpa::Recognizer::new, Marpa::Evaluator::new, and Marpa::Evaluator::set). All options are useful at any point in the parse, unless otherwise stated. Trace output goes to the trace file handle.

academic

The academic option turns off all grammar rewriting. The resulting grammar is useless for recognition and parsing. The purpose of the academic argument is allow the testing of Marpa's precomputations against examples from textbooks. This is handy for testing the internals. An exception is thrown if the user attempts to create a recognizer from a grammar marked academic. The academic option cannot be set in the recognizer or after the grammar is precomputed.

strip

The value is a Boolean. If true, Marpa "strips" its objects when they contain data that is not needed for further processing. This saves space and time. This is the default behavior.

If strip is set to false, all data in Marpa's objects, even data no longer needed for processing, is left in place for the entire life of the object. This leftover data can be very important if you're debugging.

A grammar is stripped when it is precomputed. A recognizer is stripped when the end of input is recognized. Turning strip off after the end of input has been recognized will have no effect.

trace_actions

Traces actions as they are compiled. Little or no knowledge of Marpa internals required. This option is useless once the recognizer has been created. Setting it after that point will result in a warning.

trace_evaluation
trace_iterations

Traces initialization and iteration of the parse. Knowledge of Marpa internals very useful. May usefully be set at any point in the parse.

trace_lex

A shorthand for setting both trace_lex_matches and trace_lex_tries. Very useful, and can be interpreted with limited knowledge of Marpa internals. Because Marpa uses predictive lexing, this can give you an idea not only of how lexing is working, but also of what the recognizer is looking for. May be set at any point in the parse, but will be useless if set after input is complete.

trace_lex_matches

Traces every successful match in lexing. Can be interpreted with little knowledge of Marpa internals. May be set at any point in the parse, but will be useless if set after input is complete.

trace_lex_tries

Traces every attempted match in lexing. Can be interpreted with little knowledge of Marpa internals. Usually not useful without trace_lex_matches. trace_lex turns on both. May be set at any point in the parse, but will be useless if set after input is complete.

trace_rules

Traces rules as they are added to the grammar. Useful, but you may prefer the show_rules() method. Doesn't require knowledge of Marpa internals.

A trace message warns the user if he sets this option when rules have already been added. If a user adds rules using the source named argument, and uses the trace_rules named argument in the same call, it will take effect after the processing of the source option, which is probably not what he intended. To be effective trace_rules must be set in a method call prior to the one with the source option.

trace_journal
trace_values

Takes as its value an integer zero or greater, which sets the debugging level. A debugging level of zero means no tracing of values. A level of one or more turns on tracing of values as they are pushed onto the evaluation stack, popped off it, and calculated. If the debugging level is 3 or more, the entire evaluation stack is dumped at every step in the evaluation.

Very helpful. Knowledge of Marpa internals is helpful, but not required. May usefully be set at any point.

METHODS

Static Method

show_location

    my $recce = Marpa::Recognizer->new( { grammar => $grammar } );
    my $fail_location = $recce->text( \$text_to_parse );

    if ( $fail_location >= 0 ) {
        print {*STDERR} Marpa::show_location(
            'Parsing failed',
            \$text_to_parse,
            $fail_location,
        )
        or Marpa::exception "print to STDERR failed: $OS_ERROR";
        exit 1;
    }

A utility routine helpful for creating messages about problems parsing text. Takes three arguments, all required. The first argument must be a string containing a message. The second argument must be a reference to a string containing the text that was being parsed. The third argument must be an integer, and will be interpreted as a character offset within that string.

show_location returns a multi-line string. The first, header, line contains the message. The second line is the line from the text being parsed which contains the character offset. The third line contains an ASCII "caret" symbol pointing to the position of the offset in the second line.

Grammar Methods

inaccessible_symbols

    $grammar->precompute();

    for my $symbol ( @{ $grammar->inaccessible_symbols() } ) {
        say 'Inaccessible symbol: ', $symbol;
    }

Returns the plumbing names of the inaccessible symbols. Not useful before the grammar is precomputed. Used for test scripts. For debugging and tracing, the warnings option is usually the most convenient way to obtain the same information.

show_NFA

    $grammar->precompute();

    print $grammar->show_NFA
        or Marpa::exception "print failed: $OS_ERROR";

Returns a multi-line string listing the states of the NFA with the LR(0) items and transitions for each. Not useful before the grammar is precomputed. Not really helpful for debugging grammars and requires very deep knowledge of Marpa internals.

show_QDFA

    $grammar->precompute();

    print $grammar->show_QDFA
        or Marpa::exception "print failed: $OS_ERROR";

Returns a multi-line string listing the states of the QDFA with the LR(0) items, NFA states, and transitions for each. Not useful before the grammar is precomputed. Very useful in debugging, but requires knowledge of Marpa internals.

show_accessible_symbols

    $grammar->precompute();

    say 'Accessible symbols: ',
        $grammar->show_accessible_symbols;

Returns a one-line string with the plumbing names of the accessible symbols of the grammar, space-separated. Useful in test scripts. Not useful before the grammar is precomputed. Not very useful for debugging.

show_nullable_symbols

    $grammar->precompute();

    say 'Nullable symbols: ',
        $grammar->show_nullable_symbols;

Returns a one-line string with the plumbing names of the nullable symbols of the grammar, space-separated. Useful in test scripts. Not useful before the grammar is precomputed. Not very useful for debugging.

show_nulling_symbols

    $grammar->precompute();

    say 'Nulling symbols: ',
        $grammar->show_nulling_symbols;

Returns a one-line string with the plumbing names of the nulling symbols of the grammar, space-separated. Useful in test scripts. Not useful before the grammar is precomputed. Not very useful for debugging.

show_problems

    $grammar->precompute();

    print $grammar->show_problems
        or Marpa::exception "print failed: $OS_ERROR";

Returns a string describing the problems a grammar had in the precomputation phase. For many precomputation problems, Marpa does not immediately throw an exception. This is because there are often several problems with a grammar. Throwing an exception on the first problem would force the user to fix them one at a time -- very tedious. If there were no problems, returns a string saying so.

This method is not useful before precomputation. An exception is thrown if the user attempts to stringify, or to create a parse from, a grammar with problems. The string returned by show_problems will be part of the exception's error message.

show_productive_symbols

    $grammar->precompute();

    say 'Productive symbols: ',
        $grammar->show_productive_symbols;

Returns a one-line string with the plumbing names of the productive symbols of the grammar, space-separated. Useful in test scripts. Not useful before the grammar is precomputed. Not very useful for debugging.

show_rules

    $grammar->precompute();

    print $grammar->show_rules
        or Marpa::exception "print failed: $OS_ERROR";

Returns a string listing the rules, each commented as to whether it was nullable, nulling, unproductive, inaccessible, empty or not useful. If a rule had a non-zero priority, that is also shown. Often useful and much of the information requires no knowledge of the Marpa internals to interpret.

show_rules shows a rule as not useful ("!useful") if it decides not to use it for any reason. Rules marked "!useful" include not just the ones called useless in standard parsing terminology (inaccessible and unproductive rules) but also any rule which is replaced by one of Marpa's grammar rewrites.

show_symbols

    $grammar->precompute();

    print $grammar->show_symbols
        or Marpa::exception "print failed: $OS_ERROR";

Returns a string listing the symbols, along with whether they were nulling, nullable, unproductive or inaccessible. Also shown is a list of rules with that symbol on the left hand side, and a list of rules which have that symbol anywhere on the right hand side. Often useful and much of the information requires no knowledge of the Marpa internals to interpret.

unproductive_symbols

    $grammar->precompute();

    for my $symbol ( @{ $grammar->unproductive_symbols } ) {
        say 'Unproductive symbol: ', $symbol;
    }

Given a precomputed grammar, returns the plumbing names of the unproductive symbols. Not useful before the grammar is precomputed. Used in test scripts. For debugging and tracing, the warnings option is usually a more convenient way to obtain the same information.

Recognizer Method

show_earley_sets

    my $recce = Marpa::Recognizer->new( { grammar => $grammar } );
    my $fail_location = $recce->text( \$text_to_parse );

    print $recce->show_earley_sets
        or Marpa::exception "print failed: $OS_ERROR";

Returns a multi-line string listing every Earley item in every Earley set.

Evaluator Method

show_bocage

    print $evaler->show_bocage($show_bocage_verbosity)
        or Marpa::exception "print failed: $OS_ERROR";

Returns a multi-line string describing the bocage for an evaluator. The first line gives the name of the Perl package in which Marpa runs the actions for that evaluator, and a count of the parses derived so far from the bocage. The bocage follows.

The bocage is given in pre-order, in the form of a grammar. Parse bocage grammars are similar to parse forest grammars. In the internals document, parse bocage grammars are described at length, using an example output from show_bocage.

The optional verbosity argument must be an integer greater than or equal to zero. For each parse bocage and-production, if verbosity is set greater than zero, the LR(0) item corresponding to the and-production's and-node is shown, along with the and-node's argument count (or rule length), and an indication of whether or not there is a Perl closure for the and-node.

In addition to and-productions, parse bocages grammars contain or-productions. The information in or-productions is redundant -- all of it is evident from the and-productions. Or-productions are shown only if verbosity is set at 2 or more.

SUPPORT

See the support section in the main module.

AUTHOR

Jeffrey Kegler

LICENSE AND COPYRIGHT

Copyright 2007 - 2009 Jeffrey Kegler

This program is free software; you can redistribute it and/or modify it under the same terms as Perl 5.10.0.