NAME

MarpaX::Languages::Dash - A Marpa-based parser for the DASH language

Synopsis

Typical usage:

        perl -Ilib scripts/parse.pl -de '[node]{color:blue; label: "Node name"}' -max info
        perl -Ilib scripts/parse.pl -i data/node.04.dash -max info

You can use scripts/parse.sh to simplify this process, but it assumes you're input file is in data/:

        scripts/parse.sh node.04 -max info

See the demo page for sample input and output.

Also, see the article based on this module.

Description

This module implements a parser for "DASH" (below), a wrapper language around Graphviz's DOT. That is, the module is a pre-processor for the DOT language.

Specifically, this module demonstrates how to use Marpa::R2's capabilities to have Marpa repeatedly pass control back to code in your own module, during the parse, to handle those cases where you don't want Marpa's default processing to occur.

This allows the code to deal with the classic case of where you wish to preserve whitespace in some contexts, but also want Marpa to discard whitespace in all other contexts.

DASH is easier to use than DOT, which means the user can specify graphs very simply, without having to learn DOT.

The DASH language is actually a cut-down version of the language used by Graph::Easy. For a full explanation of the Graph::Easy language, see http://bloodgate.com/perl/graph/manual/.

The wrapper is parsed into a tree of tokens managed by Tree:DAG_Node.

If requested by the user, the tree is passed to the default renderer MarpaX::Languages::Dash::Renderer. Various options allow the user to control the output, as an SVG (PNG, ...) image, and to save the DOT version of the graph.

In the past, the code in this module was part of Graph::Easy::Marpa, but that latter module has been deleted from CPAN, and all it's new code and features, together with bug fixes, is in the current module.

Note that this module's usage of Marpa's adverbs event and pause should be regarded as an intermediate/advanced technique. For people just beginning to use Marpa, use of the action adverb is the recommended technique.

The article mentioned above discusses important issues regarding the timing sequence of pauses and actions.

All this assumes a relatively recent version of Marpa, one in which its Scanless interface (SLIF) is implemented. I'm currently (2014-10-10) using Marpa::R2 V 2.096000.

Lastly, the parser and renderer will be incorporated into the next major release (V 2.00) of GraphViz2::Marpa, which parses DOT files.

Installation

Install MarpaX::Languages::Dash as you would for any Perl module:

Run:

        cpanm MarpaX::Languages::Dash

or run:

        sudo cpan MarpaX::Languages::Dash

or unpack the distro, and then either:

        perl Build.PL
        ./Build
        ./Build test
        sudo ./Build install

or:

        perl Makefile.PL
        make (or dmake or nmake)
        make test
        make install

Scripts Shipped with this Module

All scripts are shipped in the scripts/ directory.

o copy.config.pl

This is for use by the author. It just copies the config file out of the distro, so the script generate.index.pl (which uses HTML template stuff) can find it.

o find.config.pl

This cross-checks the output of copy.config.pl.

o dash2svg.pl

Converts all data/*.dash files into the corresponding html/*.svg files.

Used by generate.demo.sh.

o generate.demo.sh

This generates all the SVG files for the data/*.dash files, and then generates html/index.html.

And then it copies the demo output to my dev web server's doc root, where I can cross-check it.

o generate.index.pl

This constructs a web page containing all the html/*.svg files.

o parse.pl

This runs a parse on a single input file. Run 'parse.pl -h' for details.

o parse.sh

This simplifies running parse.pl.

o pod2html.sh

This converts all lib/*.pm files into their corresponding *.html versions, for proof-reading and uploading to my real web site.

o render.pl

This runs a parse on a single input file, and coverts the output into an SVG file. Run 'render.pl -h' for details.

o render.sh

This simplifies running render.pl.

Constructor and Initialization

new() is called as my($parser) = MarpaX::Languages::Dash -> new(k1 => v1, k2 => v2, ...).

It returns a new object of type MarpaX::Languages::Dash.

Key-value pairs accepted in the parameter list (see corresponding methods for details [e.g. description($graph)]):

o description => '[node.1]->[node.2]'

Specify a string for the graph definition.

You are strongly encouraged to surround this string with '...' to protect it from your shell if using this module directly from the command line.

See also the input_file key which reads the graph from a file.

The description key takes precedence over the input_file key.

Default: ''.

o input_file => $graph_file_name

Read the graph definition from this file.

See also the description key to read the graph from the command line.

The whole file is slurped in as a single graph.

The first lines of the file can start with /^\s*#/, and will be discarded as comments.

The description key takes precedence over the input_file key.

Default: ''.

o logger => $logger_object

Specify a logger object.

To disable logging, just set logger to the empty string.

Default: An object of type Log::Handler.

o maxlevel => $level

This option is only used if this module creates an object of type Log::Handler.

See Log::Handler::Levels.

Default: 'notice'. A typical choice is 'info' or 'debug'.

o minlevel => $level

This option is only used if this module creates an object of type Log::Handler.

See Log::Handler::Levels.

Default: 'error'.

No lower levels are used.

Methods

clean_before($s)

Cleans the input string before the next step in the parse process.

Typically only ever called once.

Returns the cleaned string.

clean_after($s)

Cleans the input string after each step in the parse process.

Typically called many times, once on each output token.

Returns the cleaned string.

description([$graph])

Here, the [] indicate an optional parameter.

Gets or sets the graph string to be parsed.

See also the "input_file([$graph_file_name])" method.

The value supplied to the description() method takes precedence over the value read from the input file.

Also, description is an option to new().

graph_text([$graph])

Here, the [] indicate an optional parameter.

Returns the value of the graph definition string, from either the command line or a file.

input_file([$graph_file_name])

Here, the [] indicate an optional parameter.

Gets or sets the name of the file to read the graph definition from.

See also the "description([$graph])" method.

The whole file is slurped in as a single graph.

The first few lines of the file can start with /^\s*#/, and will be discarded as comments.

The value supplied to the description() method takes precedence over the value read from the input file.

Also, input_file is an option to new().

log($level, $s)

Calls $self -> logger -> log($level => $s) if ($self -> logger).

run()

This is the only method the caller needs to call. All parameters are supplied to new().

Returns 0 for success and 1 for failure.

recce()

Returns an object of type Marpa::R2::Scanless::R.

tree()

Returns an object of type Tree::DAG_Node.

DASH Syntax

See the demo page for sample input and output.

The examples in the following sections are almost all taken from data/*.dash, in the distro.

Graphs in DASH

        1: A graph definition may continue over multiple lines.
        2: Lines beginning with either '#' or '//' are discarded as comments.
        3: A node name or an edge name must never be split over multiple lines.
        4: Attributes may be split over lines, but do not split either the name or value of the
                attribute over multiple lines.
                Note: Attribute values can contain various escaped characters, e.g. \n.
        5: A graph may start or end with an edge, and even have contiguous edges.
                See data/edge.06.dash (or the demo page). Graphviz does not allow any of these
                possibilities, so the default renderer fabricates anonymous nodes and inserts them where
                they will satisfy the requirements of Graphviz.

Examples:

        1: A graph split over 10 lines:
                [node.1] {label: "n 1"}
                -> {label: 'e 1'}
                -> {label: e 2}
                [] {label: n 2}
                -> {label  :  e 3}
                [node.3] {label: "n 3"}
                -> {label: 'e 4'},
                -> {label: e 5}
                [] {label: n 2}
                -> {label  :  e 6}
        2: A graph split over 14 lines:
                ->
                ->

                [node]
                [node] ->
                -> {label: Start} -> {color: red} [node.1] {color: green} -> [node.2]
                [node.1] [node.2] [node.3]

                []
                [node.1]
                [node 1]
                ['node.2']
                ["node.3"]
                [     From here     ] -> [     To there     ]

Nodes in DASH

Node names:

        1: Are delimited by '[' and ']'.
        2: May be quoted with " or '.
        3: Allow escaped characters, using '\'.
        4: Allow internal spaces, even if not quoted.
        5: May be separated with nothing (juxtaposed), with whitespace, or with ','.
                This is called 'Daisy-chaining'.

See Daisy chains for the origin of this term.

Examples:

        1: The anonymous node: []
        2: The anonymous node, with attributes (explained below): []{color:red}
        3: A named node: [Marpa]
        4: Juxtaposed nodes: [Perl][Marpa] or [Perl]  [Marpa] or [Perl], [Marpa]
        5: A named node with an internal space: [Perl 6]
        6: A named node with attributes: [node.1]{label: A and B}
        7: A named node with spaces: [    node.1    ]
                These spaces are discarded.
        8: A named node with attributes, with spaces: [  node.1  ] { label : '  A  Z  '  }
                The spaces around 'node.1' are discarded.
                The spaces around '  A  Z  ' are discarded.
                The spaces inside '  A  Z  ' are preserved (because of the quotes).
                Double-quotes act in the same way.
        9: A named node with attributes, with spaces:
                [ node.1 ] {  label  :  Flight Path from Melbourne to London  }
                Space preservation is as above.
        10: A named node with escaped characters: [\[node\]]
                The '[' and ']' chars are preserved.
        11: A named node with [] in name: [[ \]]
                However, since '[' and ']' delimit node names, you are I<strongly> advised to escape such
                characters.
        12: A named node with quotes, spaces, and escaped chars: [" a \' b \" c"]
        13: A complete graph:
                [node.1]
                -> {arrowhead: odot; arrowtail: ediamond; color: green; dir: both; label: A 1; penwidth: 1}
                -> {color: blue; label: B 2; penwidth: 3}
                -> {arrowhead: box; arrowtail: invdot; color: maroon; dir: both; label: C 3; penwidth: 5}
                [] {label: 'Some node'}
                -> [node.2]

Edges in DASH

Edge names:

        1: Are '->'
                This is part of a directed graph.
        2: Or '--'
                This is part of an undirected graph.
        3: May be separated with nothing (juxtaposed), with whitespace, or with ','.
                This is called 'Daisy-chaining'.

See Daisy chains for the origin of this term.

It makes no sense to combine '->' and '--' in a single graph, because Graphviz will automatically reject such input. In other words, directed and undirected graphs are mutually exclusive.

So, if any edge in your graph is undirected (you use '--'), then every edge must use '--' and the same for '->'.

Examples:

        1: An edge with attributes: -> {color:cornflowerblue; label: This edge's color is blueish ;}
        2: Juxtaposed edges without any spacing and without attributes: ------
        3: Juxtaposed edges (without comma) with attributes:
                -- {color: cornflowerblue; label: Top row\nBottom row}
                -- {color:red; label: Edges use cornflowerblue and red}
        4: An edge with attributes, with some escaped characters:
                -> {color:cornflowerblue; label: Use various escaped chars (\' \" \< \>) in label}

Attributes in DASH

Attributes:

        1: Are delimited by '{' and '}'.
        2: Consist of a C<name> and a C<value>, separated by ':'.
        3: Are separated by ';'.
        4: The DOT language defines a set of escape characters acceptable in such a C<value>.
        5: Allow quotes and whitespace as per node names.
                This must be true because the same non-Marpa parsers are used for both.
        6: Attribute values can be HTML-like. See the Graphviz docs for why we say 'HTML-like' and
                not HTML. See data/table.*.ge for examples.

See HTML-like labels for details.

Examples:

        1: -- {color: cornflowerblue; label: Top row\nBottom row}
                Note the use of '\n' in the value of the label.

FAQ

What is the grammar parsed by this module?

See "DASH" just above.

How is the parsed graph stored in RAM?

Items are stored in a tree managed by Tree::DAG_Node.

The sample code in the "Synopsis" will display a tree:

        perl -Ilib scripts/parse.pl -i data/node.04.dash -max info

Output:

        root. Attributes: {uid => "0"}
           |---prolog. Attributes: {uid => "1"}
           |---graph. Attributes: {uid => "2"}
               |---node_id. Attributes: {uid => "3", value => "node.1"}
               |   |---literal. Attributes: {uid => "4", value => "{"}
               |   |---label. Attributes: {uid => "5", value => "A and B"}
               |   |---literal. Attributes: {uid => "6", value => "}"}
               |---node_id. Attributes: {uid => "7", value => "node.2"}
                   |---literal. Attributes: {uid => "8", value => "{"}
                   |---label. Attributes: {uid => "9", value => "A or B"}
                   |---literal. Attributes: {uid => "10", value => "}"}
        Parse result: 0 (0 is success)

See also the next question.

What is the structure of the tree of parsed tokens?

From the previous answer, you can see the root has 2 daughters, with the 'prolog' daughter not currently used. It is used by GraphViz2::Marpa.

The 'graph' daughter (sub-tree) is what's processed by the default rendering engine MarpaX::Languages::Dash::Renderer to convert the tree (i.e. the input file) into a DOT file and into an image.

Does this module handle utf8?

Yes. See the last sample on the demo page.

Why doesn't the parser handle my HTML-style labels?

Traps for young players:

o The <br /> component must include the '/'

Why do I get error messages like the following?

        Error: <stdin>:1: syntax error near line 1
        context: digraph >>>  Graph <<<  {

Graphviz reserves some words as keywords, meaning they can't be used as an ID, e.g. for the name of the graph. So, don't do this:

        strict graph graph{...}
        strict graph Graph{...}
        strict graph strict{...}
        etc...

Likewise for non-strict graphs, and digraphs. You can however add double-quotes around such reserved words:

        strict graph "graph"{...}

Even better, use a more meaningful name for your graph...

The keywords are: node, edge, graph, digraph, subgraph and strict. Compass points are not keywords.

See keywords in the discussion of the syntax of DOT for details.

What is the homepage of Marpa?

http://savage.net.au/Marpa.html.

How do I reconcile Marpa's approach with classic lexing and parsing?

I've included in a recent article a section called Constructing a Mental Picture of Lexing and Parsing which is aimed at helping us think about this issue.

How did you generate the html/*.svg files?

With a private script which uses Graph::Easy::Marpa::Renderer::GraphViz2 V 2.00. This script is not shipped in order to avoid a dependency on that module. Also, another private script which validates Build.PL and Makefile.PL would complain about the missing dependency.

See the demo page for details.

Machine-Readable Change Log

The file Changes was converted into Changelog.ini by Module::Metadata::Changes.

Version Numbers

Version numbers < 1.00 represent development versions. From 1.00 up, they are production versions.

Repository

https://github.com/ronsavage/MarpaX-Languages-Dash

Support

Email the author, or log a bug on RT:

https://rt.cpan.org/Public/Dist/Display.html?Name=MarpaX::Languages::Dash.

Author

MarpaX::Languages::Dash was written by Ron Savage <ron@savage.net.au> in 2013.

Marpa's homepage: <http://savage.net.au/Marpa.html>.

My homepage: http://savage.net.au/.

Copyright

Australian copyright (c) 2013, Ron Savage.

        All Programs of mine are 'OSI Certified Open Source Software';
        you can redistribute them and/or modify them under the terms of
        The Artistic License, a copy of which is available at:
        http://www.opensource.org/licenses/index.html