Ron Savage

NAME

MarpaX::Grammar::Parser - Converts a Marpa grammar into a tree using Tree::DAG_Node

Synopsis

        use MarpaX::Grammar::Parser;

        my(%option) =
        (               # Inputs:
                marpa_bnf_file   => 'share/metag.bnf',
                user_bnf_file    => 'share/stringparser.bnf',
                        # Outputs:
                cooked_tree_file => 'share/stringparser.cooked.tree',
                raw_tree_file    => 'share/stringparser.raw.tree',
        );

        MarpaX::Grammar::Parser -> new(%option) -> run;

For more help, run:

         scripts/bnf2tree.pl -h

See share/*.bnf for input files and share/*.tree for output files.

Installation includes copying all files from the share/ directory, into a dir chosen by File::ShareDir. Run scripts/find.grammars.pl to display the name of that dir.

The cooked tree can be graphed with MarpaX::Grammar::GraphViz2. That module has its own demo page.

Description

MarpaX::Grammar::Parser uses Marpa::R2 to convert a user's BNF into a tree of Marpa-style attributes, (see "raw_tree()"), and then post-processes that (see "compress_tree()") to create another tree, this time containing just the original grammar (see "cooked_tree()").

The nature of these trees is discussed in the "FAQ". The trees are managed by Tree::DAG_Node.

Lastly, the major purpose of the cooked tree is to serve as input to MarpaX::Grammar::GraphViz2.

Installation

Install MarpaX::Grammar::Parser as you would for any Perl module:

Run:

        cpanm MarpaX::Grammar::Parser

or run:

        sudo cpan MarpaX::Grammar::Parser

or unpack the distro, and then either:

        perl Build.PL
        ./Build
        ./Build test
        sudo ./Build install

or:

        perl Makefile.PL
        make (or dmake or nmake)
        make test
        make install

Constructor and Initialization

new() is called as my($parser) = MarpaX::Grammar::Parser -> new(k1 => v1, k2 => v2, ...).

It returns a new object of type MarpaX::Grammar::Parser.

Key-value pairs accepted in the parameter list (see also the corresponding methods [e.g. "marpa_bnf_file([$bnf_file_name])"]):

o bind_attributes => Boolean

Include (1) or exclude (0) attributes in the tree file(s) output.

Default: 0.

o cooked_tree_file => aTextFileName

The name of the text file to write containing the grammar as a cooked tree.

If '', the file is not written.

Default: ''.

Note: The bind_attributes option/method affects the output.

o logger => aLog::HandlerObject

By default, an object of type Log::Handler is created which prints to STDOUT.

See maxlevel and minlevel below.

Set logger to '' (the empty string) to stop a logger being created.

Default: undef.

o marpa_bnf_file => aMarpaBNFFileName

Specify the name of Marpa's own BNF file. This distro ships it as share/metag.bnf.

This option is mandatory.

Default: ''.

o maxlevel => $level

This option is only used if this module creates an object of type Log::Handler.

See Log::Handler::Levels.

Nothing is printed by default.

Default: 'notice'.

o minlevel => $level

This option affects Log::Handler objects.

See the Log::Handler::Levels docs.

Default: 'error'.

No lower levels are used.

o output_hashref Boolean

Log (1) or skip (0) the hashref version of the raw tree.

Note: This needs -maxlevel elevated from its default value of 'notice' to 'info', to do anything.

Default: 0.

o raw_tree_file => aTextFileName

The name of the text file to write containing the grammar as a raw tree.

If '', the file is not written.

Default: ''.

Note: The bind_attributes option/method affects the output.

o user_bnf_file => aUserBNFFileName

Specify the name of the file containing your Marpa::R2-style grammar.

See share/stringparser.bnf for a sample.

This option is mandatory.

Default: ''.

Methods

bind_attributes([$Boolean])

Here, the [] indicate an optional parameter.

Get or set the option which includes (1) or excludes (0) node attributes from the output cooked_tree_file and raw_tree_file.

Note: bind_attributes is a parameter to new().

clean_name($name)

Returns a list of 2 elements: ($name, $attributes).

$name is just the name of the token.

$attributes is a hashref with these keys:

o bracketed_name => $Boolean

Indicates the token's name is (1) or is not (0) of the form '<...>'.

o quantifier => $char

Indicates the token is quantified. $char is one of '', '*' or '+'.

If $char is '' (the empty string), the token is not quantified.

o real_name => $string

The user-specified version of the name of the token, including leading '<' and trailing '>' if any.

compress_branch($index, $node)

Called by "compress_tree()".

Converts 1 sub-tree of the raw tree into one sub-tree of the cooked tree.

compress_tree()

Called automatically by "run()".

Converts the raw tree into the cooked tree, calling "compress_branch($index, $node)" once for each daughter of the raw tree.

Output is the tree returned by "cooked_tree()".

cooked_tree()

Returns the root node, of type Tree::DAG_Node, of the cooked tree of items in the user's grammar.

By cooked tree, I mean as post-processed from the raw tree so as to include just the original user's BNF tokens.

The cooked tree is optionally written to the file name given by "cooked_tree_file([$output_file_name])".

The nature of this tree is discussed in the "FAQ".

See also "raw_tree()".

cooked_tree_file([$output_file_name])

Here, the [] indicate an optional parameter.

Get or set the name of the file to which the cooked tree form of the user's grammar will be written.

If no output file is supplied, nothing is written.

See share/stringparser.cooked.tree for the output of post-processing Marpa's analysis of share/stringparser.bnf.

This latter file is the grammar used in MarpaX::Demo::StringParser.

Note: cooked_tree_file is a parameter to new().

Note: The bind_attributes option/method affects the output.

first_rule()

Returns the first G1-level rule in the user's gramamr. This is used to fabricate a start rule if 'start_rule' is not found in the cooked tree. This new node is not in the raw tree, but only in the cooked tree, and hence in the hashref version of the cooked tree.

The presence of a start rule helps MarpaX::Grammar::GraphViz2 generate the grammar's image.

format_hashref($depth, $hashref)

Formats the given hashref, with $depth (starting from 0) used to indent the output.

Outputs using calls to "log($level, $s)".

When you call "report_hashref()", it calls $self -> format_hashref(0, $self -> statements).

End users would normally never call this method, nor override it. Just call "report_hashref()".

log($level, $s)

Calls $self -> logger -> log($level => $s) if ($self -> logger).

logger([$logger_object])

Here, the [] indicate an optional parameter.

Get or set the logger object.

To disable logging, just set logger to the empty string.

Note: logger is a parameter to new().

marpa_bnf_file([$bnf_file_name])

Here, the [] indicate an optional parameter.

Get or set the name of the file to read Marpa's grammar from.

Note: marpa_bnf_file is a parameter to new().

maxlevel([$$level])

Here, the [] indicate an optional parameter.

Get or set the value used by the logger object.

This option is only used if an object of type Log::Handler is created. See Log::Handler::Levels.

Note: maxlevel is a parameter to new().

minlevel([$$level])

Here, the [] indicate an optional parameter.

Get or set the value used by the logger object.

This option is only used if an object of type Log::Handler is created. See Log::Handler::Levels.

Note: minlevel is a parameter to new().

new()

The constructor. See "Constructor and Initialization".

output_hashref([$Boolean])

Here, the [] indicate an optional parameter.

Get or set the option to log (1) or exclude (0) a hashref version of the raw tree.

This hashref can be output by calling "new()" as new(max => 'info', output_hashref => 1).

See also "statements()".

Note: output_hashref is a parameter to new().

raw_tree()

Returns the root node, of type Tree::DAG_Node, of the raw tree of items in the user's grammar.

By raw tree, I mean as derived directly from Marpa.

The raw tree is optionally written to the file name given by "raw_tree_file([$output_file_name])".

The nature of this tree is discussed in the "FAQ".

See also "cooked_tree()".

raw_tree_file([$output_file_name])

Here, the [] indicate an optional parameter.

Get or set the name of the file to which the raw tree form of the user's grammar will be written.

If no output file is supplied, nothing is written.

See share/stringparser.raw.tree for the output of Marpa's analysis of share/stringparser.bnf.

This latter file is the grammar used in MarpaX::Demo::StringParser.

Note: raw_tree_file is a parameter to new().

Note: The bind_attributes option/method affects the output.

report_hashref()

Outputs the hashref version of the raw tree to the logger.

It does this by calling $self -> format_hashref(0, $self -> statements), which in turn uses the logger provided in the call to "new()".

See "format_hashref($depth, $hashref)".

report_hashref() returns 0 for success and 1 for failure.

run()

The method which does all the work.

See "Synopsis" and scripts/bnf2tree.pl for sample code.

run() returns 0 for success and 1 for failure.

statements()

Returns a hashref describing the grammar provided in the user_bnf_file parameter to "new()".

The "FAQ" discusses the format of this hashref.

See also "output_hashref()".

user_bnf_file([$bnf_file_name])

Here, the [] indicate an optional parameter.

Get or set the name of the file to read the user's grammar's BNF from. The whole file is slurped in as a single string.

See share/stringparser.bnf for a sample. It is the grammar used in MarpaX::Demo::StringParser.

Note: user_bnf_file is a parameter to new().

Files Shipped with this Module

Data Files

o share/c.ast.bnf

This is part of MarpaX::Languages::C::AST, by Jean-Damien Durand. It's 1,565 lines long.

The outputs are share/c.ast.cooked.tree and share/c.ast.raw.tree.

o share/c.ast.cooked.tree

This is the output from post-processing Marpa's analysis of share/c.ast.bnf.

The command to generate this file is:

        scripts/bnf2tree.sh c.ast
o share/c.ast.raw.tree

This is the output from processing Marpa's analysis of share/c.ast.bnf. It's 56,723 lines long, which indicates the complexity of Jean-Damien's grammar for C.

The command to generate this file is:

        scripts/bnf2tree.sh c.ast
o share/json.1.bnf

It is part of MarpaX::Demo::JSONParser, written as a gist by Peter Stuifzand.

See https://gist.github.com/pstuifzand/4447349.

The command to process this file is:

        scripts/bnf2tree.sh json.1

The outputs are share/json.1.cooked.tree and share/json.1.raw.tree.

o share/json.2.bnf

It also is part of MarpaX::Demo::JSONParser, written by Jeffrey Kegler as a reply to the gist above from Peter.

The command to process this file is:

        scripts/bnf2tree.sh json.2

The outputs are share/json.2.cooked.tree and share/json.2.raw.tree.

o share/json.3.bnf

The is yet another JSON grammar written by Jeffrey Kegler.

The command to process this file is:

        scripts/bnf2tree.sh json.3

The outputs are share/json.3.cooked.tree and share/json.3.raw.tree.

o share/metag.bnf.

This is a copy of Marpa::R2's BNF. That is, it's the file which Marpa uses to validate both its own metag.bnf (self-reflexively), and any user's BNF file.

See "marpa_bnf_file([$bnf_file_name])" above.

The command to process this file is:

        scripts/bnf2tree.sh metag

The outputs are share/metag.cooked.tree and share/metag.raw.tree.

o share/metag.hashref

Created by:

        scripts/bnf2tree.pl -mar share/metag.bnf -u share/metag.bnf -r share/metag.raw.tree \
                -max info > share/metag.hashref
o share/stringparser.bnf.

This is a copy of MarpaX::Demo::StringParser's BNF.

See "user_bnf_file([$bnf_file_name])" above.

The command to process this file is:

        scripts/bnf2tree.sh stringparser

The outputs are share/stringparser.cooked.tree and share/stringparser.raw.tree.

o share/stringparser.hashref

Created by:

        scripts/bnf2tree.pl -mar share/stringparser.bnf -u share/stringparser.bnf \
                -r share/stringparser.raw.tree -max info > share/stringparser.hashref
o share/stringparser.treedumper

This is the output of running:

        scripts/metag.pl share/metag.bnf share/stringparser.bnf > \
                share/stringparser.treedumper

That script, metag.pl, is discussed just below, and in the "FAQ".

o share/termcap.info.bnf

It is part of MarpaX::Database::Terminfo, written by Jean-Damien Durand.

The command to process this file is:

        scripts/bnf2tree.sh termcap.info

The outputs are share/termcap.info.cooked.tree and share/termcap.info.raw.tree.

Scripts

These scripts are all in the scripts/ directory.

o bnf2tree.pl

This is a neat way of using this module. For help, run:

        scripts/bnf2tree.pl -h

Of course you are also encouraged to include the module directly in your own code.

o bnf2tree.sh

This is a quick way for me to run bnf2tree.pl.

o find.grammars.pl

This prints the path to a grammar file. After installation of the module, run it with any of these parameters:

        scripts/find.grammars.pl (Defaults to json.1.bnf)
        scripts/find.grammars.pl c.ast.bnf
        scripts/find.grammars.pl json.1.bnf
        scripts/find.grammars.pl json.2.bnf
        scripts/find.grammars.pl json.3.bnf
        scripts/find.grammars.pl stringparser.bnf
        scripts/find.grammars.pl termcap.inf.bnf

It will print the name of the path to given grammar file.

o metag.pl

This is Jeffrey Kegler's code. See the "FAQ" for more.

o pod2html.sh

This lets me quickly proof-read edits to the docs.

FAQ

What is this BNF (SLIF-DSL) thingy?

Marpa's grammars are written in what we call a SLIF-DSL. Here, SLIF stands for Marpa's Scanless Interface, and DSL is Domain-specific Language.

Many programmers will have heard of BNF. Well, Marpa's SLIF-DSL is an extended BNF. That is, it includes special tokens which only make sense within the context of a Marpa grammar. Hence the 'Domain Specific' part of the name.

In practice, this means you express your grammar in a string, and Marpa treats that as a set of rules as to how you want Marpa to process your input stream.

Marpa's docs for its SLIF-DSL are here.

What is the difference between the cooked tree and the raw tree?

The raw tree is generated by processing the output of Marpa's parse of the user's grammar file. It contains Marpa's view of that grammar.

The cooked tree is generated by post-processing the raw tree, to extract just the user's grammar's tokens. It contains the user's view of their grammar.

The cooked tree can be graphed with MarpaX::Grammar::GraphViz2. That module has its own demo page.

The following items explain this in more detail.

What are the details of the nodes in the cooked tree?

Under the root, there are a set of nodes:

o N nodes, 1 per statement in the grammar

The node's names are the left-hand side of each statement in the grammar.

Each node is the root of a subtree describing the statement.

Under those nodes are a set of nodes:

o 1 node for the separator between the left and right sides of the statement

So, the node's name is one of: '=' '::=' or '~'.

o 1 node per token from the right-hand side of each statement

The node's name is the token itself.

The attributes of each node are a hashref, with these keys:

o bracketed_name => $Boolean

Indicates the token's name is or is not of the form '<...>'.

o quantifier => $char

Indicates the token is quantified. $char is one of '', '*' or '+'.

If $char is '' (the empty string), the token is not quantified.

o real_name => $string

The user-specified version of the name of the token, including leading '<' and trailing '>' if any.

See share/stringparser.cooked.tree.

What are the details of the nodes in the raw tree?

Under the root, there are a set of nodes:

o One node for the offset of the start of each grammar statement within the input stream.

The node's name is the integer start offset.

o One node for the offset of the end of each grammar statement within the input stream.

The node's name is the integer end offset.

o N nodes, 1 per statement in the grammar

The node's names are either an item from the user's grammar (when the attribute 'type' is 'Grammar') or a Marpa-assigned token (when the attribute 'type' is 'Marpa').

Each node is the root of a subtree describing the statement.

See share/stringparser.raw.attributes.tree. The tree has attributes displayed using (bind_attributes => 1), and share/stringparser.raw.tree for the same tree without attributes (bind_attributes => 0).

The attributes of each node are a hashref, with these keys:

o type

This indicates what type of node it is. Values:

o 'Grammar' => The node's name is an item from the user-specified grammar.
o 'Marpa' => Marpa has assigned a class to the node (or to one of its parents)

The class name is for the form: $class_name::$node_name.

$class_name is a constant provided by this module, and is 'MarpaX::Grammar::Parser::Dummy'.

The technique used to generate this file is discussed above, under "Data Files".

Note: The file share/stringparser.treedumper shows some class names, but they are currently not stored in the tree returned by the method "raw_tree()".

See share/stringparser.raw.tree.

Why are attributes used to identify bracketed names?

Because dot assigns a special meaning to labels which begin with '<' and '<<'.

What is the format of the hashref of the cooked tree?

The keys in the hashref are the types of statements found in the grammar, and the values for those keys are either '1' to indicate the key exists, or a hashref.

The latter hashref's keys are all the sub-types of statements found in the grammar, for the given statement.

The pattern of keys pointing to either '1' or a hashref, is repeated to whatever depth is required to represent the tree.

See share/*.hashref for sample output. Instructions for producing this output are detailed under "Data Files".

Why did you write your own formatter for the output hashref?

I tried some fine modules (Data::Dumper, Data::Dumper::Concise and Data::Dump::Streamer), but even though they may have every option you want, they don't have the options I want.

How do I sort the daughters of a node?

Here's one way, using the node names as sort keys.

As an example, choose $root as either $self -> cooked_tree or $self -> raw_tree, and then:

        @daughters = sort{$a -> name cmp $b -> name} $root -> daughters;

        $root -> set_daughters(@daughters);

Note: Since the original order of the daughters, in both the cooked and raw trees, is significant, sorting is contra-indicated.

Where did the basic code come from?

Jeffrey Kegler wrote it, and posted it on the Google Group dedicated to Marpa, on 2013-07-22, in the thread 'Low-hanging fruit'. I modified it slightly for a module context.

The original code is shipped as scripts/metag.pl.

Why did you use Data::TreeDump?

It offered the output which was most easily parsed of the modules I tested. The others were Data::Dumper, Data::TreeDraw, Data::TreeDumper and Data::Printer.

Where is Marpa's Homepage?

http://jeffreykegler.github.io/Ocean-of-Awareness-blog/.

Are there any articles discussing Marpa?

Yes, many by its author, and several others. See Marpa's homepage, just above, and:

The Marpa Guide, (in progress, by Peter Stuifzand and Ron Savage).

Parsing a here doc, by Peter Stuifzand.

An update of parsing here docs, by Peter Stuifzand.

Conditional preservation of whitespace, by Ron Savage.

See Also

MarpaX::Demo::JSONParser.

MarpaX::Demo::StringParser.

MarpaX::Grammar::GraphViz2.

MarpaX::Languages::C::AST.

Data::TreeDumper.

Log::Handler.

Machine-Readable Change Log

The file Changes was converted into Changelog.ini by Module::Metadata::Changes.

Version Numbers

Version numbers < 1.00 represent development versions. From 1.00 up, they are production versions.

Repository

https://github.com/ronsavage/MarpaX-Grammar-Parser

Support

Email the author, or log a bug on RT:

https://rt.cpan.org/Public/Dist/Display.html?Name=MarpaX::Grammar::Parser.

Author

MarpaX::Grammar::Parser was written by Ron Savage <ron@savage.net.au> in 2013.

Marpa's homepage: http://savage.net.au/Marpa.html.

Homepage: http://savage.net.au/.

Copyright

Australian copyright (c) 2013, Ron Savage.

        All Programs of mine are 'OSI Certified Open Source Software';
        you can redistribute them and/or modify them under the terms of
        The Artistic License 2.0, a copy of which is available at:
        http://www.opensource.org/licenses/index.html