Ron Savage

NAME

Graph::Easy::Marpa::Parser - A Marpa-based parser for Graph::Easy::Marpa files

Synopsis

See "Synopsis" in Graph::Easy::Marpa.

Description

Graph::Easy::Marpa::Parser provides a Marpa-based parser for Graph::Easy::Marpa-style graph definitions.

Installation

Install Graph::Easy::Marpa as you would for any Perl module:

Run:

        cpanm Graph::Easy::Marpa

or run:

        sudo cpan Graph::Easy::Marpa

or unpack the distro, and then either:

        perl Build.PL
        ./Build
        ./Build test
        sudo ./Build install

or:

        perl Makefile.PL
        make (or dmake or nmake)
        make test
        make install

Constructor and Initialization

new() is called as my($parser) = Graph::Easy::Marpa::Parser -> new(k1 => v1, k2 => v2, ...).

It returns a new object of type Graph::Easy::Marpa::Parser.

Key-value pairs accepted in the parameter list (see corresponding methods for details [e.g. graph()]):

o description => '[node.1]<->[node.2]'

Specify a string for the graph definition.

You are strongly encouraged to surround this string with '...' to protect it from your shell.

See also the 'input_file' key to read the graph from a file.

The 'description' key takes precedence over the 'input_file' key.

o input_file => $graph_file_name

Read the graph definition from this file.

See also the 'graph' key to read the graph from the command line.

The whole file is slurped in as 1 graph.

The first lines of the input file can start with /^\s*#/, and will be discarded as comments.

The 'description' key takes precedence over the 'input_file' key.

o logger => $logger_object

Specify a logger object.

To disable logging, just set logger to the empty string.

The default value is an object of type Log::Handler.

o maxlevel => $level

This option is only used if an object of type Log::Handler is created. See logger above.

See also Log::Handler::Levels.

Default: 'info'. A typical value is 'debug'.

o minlevel => $level

This option is only used if an object of type Log::Handler is created. See logger above.

See also Log::Handler::Levels.

Default: 'error'.

No lower levels are used.

o report_items => $Boolean

Calls "report()" to report, via the log, the items recognized by the state machine.

See "Data and Script Interaction" in Graph::Easy::Marpa.

Methods

file([$file_name])

The [] indicate an optional parameter.

Get or set the name of the file the graph will be read from.

See "get_graph_from_file()".

generate_token_file($file_name)

Returns nothing.

Writes a CSV file of tokens output by the parse if new() was called with the token_file option.

get_graph_from_command_line()

If the caller has requested a graph be parsed from the command line, with the graph option to new(), get it now.

Called as appropriate by run().

get_graph_from_file()

If the caller has requested a graph be parsed from a file, with the file option to new(), get it now.

Called as appropriate by run().

grammar()

Returns an object of type Marpa::R2::Scanless::G.

input_file([$graph_file_name])

Here, the [] indicate an optional parameter.

Get or set the name of the file to read the graph definition from.

See also the description() method.

The whole file is slurped in as 1 graph.

The first lines of the input file can start with /^\s*#/, and will be discarded as comments.

The value supplied to the description() method takes precedence over the value read from the input file.

items()

Returns a object of type Set::Array, which is an arrayref of items output by the state machine.

See the "FAQ" for details.

log($level, $s)

Calls $self -> logger -> $level($s).

logger([$logger_object])

Here, the [] indicate an optional parameter.

Get or set the logger object.

To disable logging, just set logger to the empty string.

maxlevel([$string])

Here, the [] indicate an optional parameter.

Get or set the value used by the logger object.

This option is only used if an object of type Log::Handler is created. See Log::Handler::Levels.

minlevel([$string])

Here, the [] indicate an optional parameter.

Get or set the value used by the logger object.

This option is only used if an object of type Log::Handler is created. See Log::Handler::Levels.

recce()

Returns an object of type Marpa::R2::Scanless::R.

renumber_items()

Ensures each item in the stack as a sequential number 1 .. N.

report()

Report, via the log, the list of items recognized by the state machine.

report_items([0 or 1])

The [] indicate an optional parameter.

Get or set the value which determines whether or not to report the items recognised by the state machine.

run()

This is the only method the caller needs to call. All parameters are supplied to new().

Returns 0 for success and 1 for failure.

token_file([$csv_file_name])

The [] indicate an optional parameter.

Get or set the name of the file to write containing the tokens (items) output from the parser.

tokens()

Returns an arrayref of tokens. Each element of this arrayref is an arrayref of 2 elements:

o The type of the token
o The value of the token

If you look at the source code for the run() method in Graph::Easy::Marpa, you'll see this arrayref can be passed directly as the value of the items key in the call to Graph::Easy::Marpa::Renderer::GraphViz2's run() method.

FAQ

What is the Graph::Easy::Marpa language?

Basically, it is derived from, and very similar to, the Graph::Easy language, with a few irregularities cleaned up. It exists to server as a wrapper around the DOT language.

The re-write took place because, instead of Graph::Easy's home-grown parser, Graph::Easy::Marpa::Parser uses Marpa::R2, which requires a formally-spelled-out grammar for the language being parsed.

That grammar is in the source code of Graph::Easy::Marpa::Parser, in sub BUILD(), and is explained next.

Firstly, a summary:

        Element        Syntax
        ---------------------
        Edge names     Either '->' or '--'
        ---------------------
        Node names     1: Delimited by '[' and ']'.
                       2: May be quoted with " or '.
                       3: Escaped characters, using '\', are allowed.
                       4: Internal spaces in node names are preserved even if not quoted.
        ---------------------
        Attributes     1: Delimited by '{' and '}'.
                       2: Within that, any number of "key : value" pairs separated by ';'.
                       3: Values may be quoted with " or ' or '<...>' or '<<table>...</table>>'.
                       4: Escaped characters, using '\', are allowed.
                       5: Internal spaces in attribute values are preserved even if not quoted.
        ---------------------

Note: Both edges and nodes can have attributes.

Note: HTML-like labels trigger special-case processing in Graphviz. See "Why doesn't the parser handle my HTML-style labels?" below.

Demo pages:

        L<Graph::Easy::Marpa|http://savage.net.au/Perl-modules/html/graph.easy.marpa/>
        L<MarpaX::Demo::StringParser|http://savage.net.au/Perl-modules/html/marpax.demo.stringparser/>

The latter page utilizes a cut-down version of the Graph::Easy::Marpa language, as documented in "What is the grammar you parse?" in MarpaX::Demo::StringParser.

And now the details:

o Attributes

Both nodes and edges can have any number of attributes.

Attributes are delimited by '{' and '}'.

These attributes are listed immdiately after their owing node or edge.

Each attribute consists of a key:value pair, where ':' must appear literally.

These key:value pairs must be separated by the ';' character. A trailing ';' is optional.

The values for 'key' are reserved words used by Graphviz's attributes. These keys match the regexp /^[a-zA-Z_]+$/.

For the 'value', any printable character can be used.

Some escape sequences are a special meaning within Graphviz.

E.g. if you use [node name] {label: \N}, then if that graph is input to Graphviz's dot, \N will be replaced by the name of the node.

Some literals - ';', '}', '<', '>', '"', "'" - can be used in the attribute's value, but they must satisfy one of these conditions. They must be:

o Escaped using '\'.

Eg: \;, \}, etc.

o Placed inside " ... "
o Placed inside ' ... '
o Placed inside <...>

This does not mean you can use <<Some text>>. See the next point.

o Placed inside <<table> ... </table>>

Using this construct allows you to use HTML entities such as &amp;, &lt;, &gt; and &quot;.

Internal spaces are preserved within an attribute's value, but leading and trailing spaces are not (unless quoted).

Samples:

        [node.1] {color: red; label: Green node}
        -> {penwidth: 5; label: From Here to There}
        [node.2]
        -> {label: "A literal semicolon '\;' in a label"}

Note: That '\;' does not actually need those single-quote characters, since it is within a set of double-quotes.

Note: Attribute values quoted with a balanced pair or single- or double-quotes will have those quotes stripped.

o Classes

Class and subclass names must match /^(edge|global|graph|group|node)(\.[a-z]+)?$/.

The name before the '.' is the class name.

'global' is used to specify whether you want a directed or undirected graph. The default is directed.

        global {directed: 1} [node.1] -> [node.2]

'graph' is used to specify the direction of the graph as a whole, and must be one of: LR or RL or TB or BT. The default is TB.

        graph {rankdir: LR} [node.1] -> [node.2]

The name after the '.' is the subclass name. And if '.' is present, the subclass name must be present. This means things like 'edge.' etc are syntax errors.

        node {shape: rect} node.forest {color: green}
        [node.1] -> [node.2] {class: forest} -> [node.3] {shape: circle; color: blue}

Here, node.1 gets the default shape, rect, and node.2 gets both shape rect and color green. node.3 gets shape circle and color blue.

As always, specific attributes override class attributes.

You use the subclass name in the attributes of an edge, a group or a node, whereas 'global' and 'graph' appear only once, at the start of the input stream. That is, tt does not make sense for a class of global or graph to have any subclasses.

o Comments

The first few lines of the input file can start with /^\s*#/, and will be discarded as comments.

o Daisy-chains

See Wikipedia for the origin of this term.

o Edges

Edges can be daisy-chained by juxtaposition, or by using a comma (','), newline, space, or attributes ('{...}') to separate them.

Hence both of these are valid: '->,->{color:green}' and '->{color:red}->{color:green}'.

See data/edge.03.ge and data/edge.09.ge.

o Groups

Groups can be daisy chained by juxtaposition, or by using a newline or space to separate them.

o Nodes

Nodes can be daisy-chained by juxtaposition, or by using a comma (','), newline, space, or attributes ('{...}') to separate them.

Hence all of these are valid: '[node.1][node.2]' and '[node.1],[node.2]' and '[node.1]{color:red}[node.2]'.

o Edges

Edge names are either '->' or '--'.

No other edge names are accepted.

Note: The syntax for edges is just a visual clue for the user. The directed 'v' undirected nature of the graph depends on the value of the 'directed' attribute present (explicitly or implicitly) in the input stream. Nevertheless, usage of '->' or '--' must match the nature of the graph, or Graphviz will issue a syntax error.

The default is {directed: 1}. See data/class.global.01.ge for a case where we use {directed: 0} attached to class 'global'.

Edges can have attributes such as arrowhead, arrowtail, etc. See Graphviz

Samples:

        ->
        -- {penwidth: 9}
o Graphs

Graphs are sequences of nodes and edges, in any order.

The sample given just above for attributes is in fact a single graph.

A sample:

        [node]
        [node] ->
        -> {label: Start} -> {color: red} [node.1] {color: green] -> [node.2]
        [node.1] [node.2] [node.3]

For more samples, see the data/*.ge files shipped with the distro.

o Line-breaks

These are converted into a single space.

o Nodes

Nodes are delimited by '[' and ']'.

Within those, any printable character can be used for a node's name.

Some literals - ']', '"', "'" - can be used in the node's value, but they must satisfy one of these conditions. They must be:

o Escaped using '\'

Eg: \].

o Placed inside " ... "
o Placed inside ' ... '

Internal spaces are preserved within a node's name, but leading and trailing spaces are not (unless quoted).

Lastly, the node's name can be empty. I.e.: You use '[]' in the input stream to create an anonymous node.

Samples:

        []
        [node.1]
        [node 1]
        [[node\]]
        ["[node]"]
        [     From here     ] -> [     To there     ]

Note: Node names quoted with a balanced pair or single- or double-quotes will have those quotes stripped.

o Subgraphs aka Groups

Subgraph names must match /^[a-zA-Z_.][a-zA-Z_0-9. ]*$/.

Subgraph names beginning with 'cluster' trigger special-case processing within Graphviz.

See 'Subgraphs and Clusters' on this page.

Samples:

        Here, the subgraph name is 'cluster.1':
        (cluster.1: [node.1] -> [node.2])
        group {bgcolor: red} (cluster.1: [node.1] -> [node.2]) {class: group}

Does this module handle utf8?

Yes. See the last sample on the demo page.

How is the parsed graph stored in RAM?

Items are stored in an arrayref. This arrayref is available via the "items()" method.

Each element in the array is a hashref, listed here in alphabetical order by type.

Note: Items are numbered from 1 up.

o Attributes

An attribute can belong to a graph, node or an edge. An attribute definition of '{color: red;}' would produce a hashref of:

        {
        count => $n,
        name  => 'color',
        type  => 'attribute',
        value => 'red',
        }

An attribute definition of '{color: red; shape: circle;}' will produce 2 hashrefs, i.e. 2 sequential elements in the arrayref:

        {
        count => $n,
        name  => 'color',
        type  => 'attribute',
        value => 'red',
        }

        {
        count => $n + 1,
        name  => 'shape',
        type  => 'attribute',
        value => 'circle',
        }

Attribute hashrefs appear in the arrayref immediately after the item (edge, group, node) to which they belong. For subgraphs, this means they appear straight after the hashref whose type is 'pop_subgraph'.

The following has been extracted manually from the Graphviz documentation, and is listed here in case I need it. Classes are written as [x] rather than [x]+, etc, so it uses various abbreviations.

        Attribute       Regexp+                 Interpretation
        ---------       ------                  --------------
        addDouble       +?[0-9.]                A double preceeded by an optional '+'
        arrowType       [a-z]                   A word
        aspectType      [0-9.,]                 A double or a double + ',' + an integer
        bool            [a-zA-Z0-0]             Case-insensitive 'true', 'false', 0 or N (true)
        color           [#0-9a-f]               '#' followed by 3 or 4 hex numbers
                                [0-9. ]                 3 numbers 0 .. 1 separated by '.' or \s
                                [/a-z]                  A word or /word or /word1/word2
        clusterMode     [a-z]                   A word
        colorList       color(;[0-9.])? N tokens separated by ':'.
        dirType         [a-z]                   A word
        doubleList      [0-9.:]                 Various doubles separated by ':'
        escString       \[NGETHLnlr]    A list of escaped letters
        HTML label      <<[.]>>                 A quoted list of stuff
        layerRange
        lblString       escString or HTML label
        outputMode      [a-z]                   A word
        pagedir         [A-Z]                   A word of 2 caps (TB etc)
        point           [0-9.,]!?               2 doubles followed by an optional '!'
        pointList       [0-9., ]!?              A list of points separated by spaces
        quadType        [a-z]                   A word
        rankdir         [A-Z]                   A word of 2 caps (TB etc)
        rankType        [a-z]                   A word
        rect            [0-9.,]                 Four doubles seperated by ','s
        shape           [a-z]                   A word, or
                                [<>{}]                  Bracketed strings, or
                                ?                               User-defined
        smoothType      [a-z]                   A word
        splineType      [0-9.,;es]              Various doubles, with ',' and ';', and optional 'e', 's'
        startType       [a-z][0-9]              A word optionally followed by a number
        style           [a-z(),]                A list of words separated by ',' each with optional '(...)'
        viewPort        [0-9.,]                 A list of 5 doubles or
                                [0-9.,]                 A list of 4 doubles followed by a node name
o Classes and class attributes

These notes apply to all classes and subclasses.

A class definition of 'edge {color: white}' would produce 2 hashrefs:

        {
        count => $n,
        name  => 'edge',
        type  => 'class_name',
        value => '',
        }

        {
        count => $n + 1,
        name  => 'color',
        type  => 'attribute',
        value => 'white',
        }

A class definition of 'node.green {color: green; shape: rect}' would produce 3 hashrefs:

        {
        count => $n,
        name  => 'node.green',
        type  => 'class_name',
        value => '',
        }

        {
        count => $n + 1,
        name  => 'color',
        type  => 'attribute',
        value => 'green',
        }

        {
        count => $n + 2,
        name  => 'shape',
        type  => 'attribute',
        value => 'rect',
        }

Class and class attribute hashrefs always appear at the start of the arrayref of items.

o Edges

An edge definition of '->' would produce a hashref of:

        {
        count => $n,
        name  => '->',
        type  => 'edge',
        value => '',
        }
o Nodes

A node definition of '[Name]' would produce a hashref of:

        {
        count => $n,
        name  => 'Name',
        type  => 'node',
        value => '',
        }

A node can have a definition of '[]', which means it has no name. Such nodes are called anonymous (or invisible) because while they take up space in the output stream, they have no printable or visible characters in the output stream.

Each anonymous node will have at least these 2 attributes:

        {
                count => $n,
                name  => '',
                type  => 'node',
                value => '',
        }

        {
                count => $n + 1,
                name  => 'color',
                type  => 'attribute',
                value => 'invis',
        }

You can of course give your anonymous nodes any attributes, but they will be forced to have these attributes.

E.g. If you give it a color, that would become element $n + 2 in the arrayref, and hence that color would override the default color 'invis'. See the output for data/node.04.ge on the demo page.

Node names are case-sensitive in dot.

o Subgraphs

Subgraph names must match /^(?:[a-zA-Z_.][a-zA-Z_.0-9]*)^/.

A subgraph produces 2 hashrefs, one at the start of the subgraph, and one at the end.

A group defnition of '(Solar system: [Mercury] -> [Neptune])' would produce a hashref like this at the start, i.e. when the '(' - just before 'Solar' - is detected in the input stream:

        {
        count => $n,
        name  => 'Solar system',
        type  => 'push_subgraph',
        value => '',
        }

and a hashref like this at the end, i.e. when the ')' - just after '[Neptune]' - is detected:

        {
        count => $n + N,
        name  => 'Solar system',
        type  => 'pop_subgraph',
        value => '',
        }

Why doesn't the parser handle my HTML-style labels?

Traps for young players:

o The <br /> component must include the '/'
o If any tag's attributes use double-quotes, they will be doubled in the CSV output file

That is, just like double-quotes everywhere else.

See http://www.graphviz.org/content/dot-language for details of Graphviz's HTML-like syntax.

See data/node.16.ge and data/node.17.ge for a couple of examples.

Why do I get error messages like the following?

        Error: <stdin>:1: syntax error near line 1
        context: digraph >>>  Graph <<<  {

Graphviz reserves some words as keywords, meaning they can't be used as an ID, e.g. for the name of the graph. So, don't do this:

        strict graph graph{...}
        strict graph Graph{...}
        strict graph strict{...}
        etc...

Likewise for non-strict graphs, and digraphs. You can however add double-quotes around such reserved words:

        strict graph "graph"{...}

Even better, use a more meaningful name for your graph...

The keywords are: node, edge, graph, digraph, subgraph and strict. Compass points are not keywords.

See keywords in the discussion of the syntax of DOT for details.

Where are the action subs named in the grammar?

In Graph::Easy::Marpa::Actions.

Has any graph syntax changed moving from V 1.* to V 2.*?

Yes. Under V 1.*, to specify an empty label, this was possible:

        [node] { label: ;}

Any attribute, here label, without a value, is unacceptable under V 2.*. Just use:

        [node] { label: ''; }

Of cource, the same applies to attributes for edges.

Machine-Readable Change Log

The file Changes was converted into Changelog.ini by Module::Metadata::Changes.

Version Numbers

Version numbers < 1.00 represent development versions. From 1.00 up, they are production versions.

Support

Email the author, or log a bug on RT:

https://rt.cpan.org/Public/Dist/Display.html?Name=Graph::Easy::Marpa.

Author

Graph::Easy::Marpa was written by Ron Savage <ron@savage.net.au> in 2011.

Home page: http://savage.net.au/.

Copyright

Australian copyright (c) 2011, Ron Savage.

        All Programs of mine are 'OSI Certified Open Source Software';
        you can redistribute them and/or modify them under the terms of
        The Artistic License, a copy of which is available at:
        http://www.opensource.org/licenses/index.html



Hosting generously
sponsored by Bytemark