The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

GraphViz2::Marpa::Parser - A Perl parser for Graphviz dot files. Input comes from GraphViz2::Marpa::Lexer.

Synopsis

o Display help
        perl scripts/lex.pl   -h
        perl scripts/parse.pl -h
        perl scripts/g2m.pl   -h
o Run the lexer
        perl scripts/lex.pl -input_file x.gv -lexed_file x.lex

        x.gv is a Graphviz dot file. x.lex will be a CSV file of lexed tokens.
o Run the parser without running the lexer or the default renderer
        perl scripts/parse.pl -lexed_file x.lex -parsed_file x.parse

        x.parse will be a CSV file of parsed tokens.
o Run the parser and the default renderer
        perl scripts/parse.pl -lexed_file x.lex -parsed_file x.parse -output_file x.rend

        x.rend will be a Graphviz dot file.
o Run the lexer, parser and default renderer
        perl scripts/g2m.pl -input_file x.gv -lexed_file x.lex -parsed_file x.parse -output_file x.rend

Description

GraphViz2::Marpa::Lexer provides a Marpa::XS-based parser for http://www.graphviz.org/ dot files.

The input is expected to be, via RAM or a CSV file, from GraphViz2::Marpa::Lexer.

Demo lexer/parser output: http://savage.net.au/Perl-modules/html/graphviz2.marpa/index.html.

State Transition Table: http://savage.net.au/Perl-modules/html/graphviz2.marpa/default.stt.html.

Command line options and object attributes: http://savage.net.au/Perl-modules/html/graphviz2.marpa/code.attributes.html.

My article on this set of modules: http://www.perl.com/pub/2012/10/an-overview-of-lexing-and-parsing.html.

The Marpa grammar as an image: http://savage.net.au/Ron/html/graphviz2.marpa/Marpa.Grammar.svg. This image was created with Graphviz via GraphViz2.

Installation

Install GraphViz2::Marpa as you would for any Perl module:

Run:

        cpanm GraphViz2::Marpa

or run:

        sudo cpan GraphViz2::Marpa

or unpack the distro, and then either:

        perl Build.PL
        ./Build
        ./Build test
        sudo ./Build install

or:

        perl Makefile.PL
        make (or dmake or nmake)
        make test
        make install

Constructor and Initialization

new() is called as my($parser) = GraphViz2::Marpa::Parser -> new(k1 => v1, k2 => v2, ...).

It returns a new object of type GraphViz2::Marpa::Parser.

Key-value pairs accepted in the parameter list (see corresponding methods for details [e.g. maxlevel()]):

o lexed_file => $aLexedOutputFileName

Specify the name of a CSV file of lexed tokens to read. This file can be output from the lexer.

Default: ''.

The default means the file is not read.

The value supplied by the 'tokens' option takes preference over the 'lexed_file' option.

See the distro for data/*.lex.

o logger => $aLoggerObject

Specify a logger compatible with Log::Handler, for the parser and renderer to use.

Default: A logger of type Log::Handler which writes to the screen.

To disable logging, just set 'logger' to the empty string (not undef).

o maxlevel => $logOption1

This option affects Log::Handler.

You can get more output by calling new(maxlevel => 'info') and even more with new(maxlevel => 'debug').

See the Log::Handler::Levels docs.

Default: 'notice'.

o minlevel => $logOption2

This option affects Log::Handler.

See the Log::Handler::Levels docs.

Default: 'error'.

No lower levels are used.

o output_file => aRenderedOutputFileName

Specify the name of a file to be passed to the renderer.

Default: ''.

The default means the renderer is not called.

o parsed_file => aParsedOutputFileName

Specify the name of a CSV file of parsed tokens to write. This file can be input to the default renderer.

Default: ''.

The default means the file is not written.

o renderer => $aRendererObject

Specify a renderer for the parser to use.

Default: A object of type GraphViz2::Marpa::Renderer::GraphViz2.

o report_forest => $Boolean

Log the forest of paths recognised by the parser.

Default: 0.

o report_items => $Boolean

Log the items recognised by the lexer.

Default: 0.

o tokens => anArrayrefOfLexedTokens

Specify an arrayref of tokens output by the lexer.

The value supplied by the 'tokens' option takes preference over the 'lexed_file' option.

Methods

edges()

Returns an object of type Tree, where the root element is not used, but the children of this root are each the first node in a path. Here, path means each separately specified path in the input file.

Consider part of data/55.gv:

        A -> B
        ...
        B -> C [color = orange penwidth = 5]
        ...
        C -> D [arrowtail = obox arrowhead = crow dir = both minlen = 2]
        D -> E [arrowtail = odot arrowhead = dot dir = both minlen = 2 penwidth = 5]

Even though Graphviz will link A -> B -> C -> D when drawing the image, edges() returns 4 separate paths. If you call new() as new(report_forest => 1) on data/55.gv, the output will include:

        Edges:
        root. Edge attrs: {}
           |---A. Edge attrs: {color => "purple"}
           |   |---B. Edge attrs: {}
           |---B. Edge attrs: {color => "orange", penwidth => "5"}
           |   |---C. Edge attrs: {}
           |---C. Edge attrs: {arrowhead => "crow", arrowtail => "obox", color => "purple", dir => "both", minlen => "2"}
           |   |---D. Edge attrs: {}
           |---D. Edge attrs: {arrowhead => "dot", arrowtail => "odot", color => "purple", dir => "both", minlen => "2", penwidth => "5"}
           |   |---E. Edge attrs: {}
        ...

This says:

o Each path starts from a child of the root
o The attributes of an edge are stored in the parent of the 2 nodes making up each edge's segment

If the last path was:

        D -> E -> F [arrowtail = odot arrowhead = dot dir = both minlen = 2 penwidth = 5]

Then the output would be:

           |---D. Edge attrs: {arrowhead => "dot", arrowtail => "odot", color => "purple", dir => "both", minlen => "2", penwidth => "5"}
           |   |---E. Edge attrs: {}
           |       |---F. Edge attrs: {}

This structure is used by "find_clusters()" in GraphViz2::Marpa::PathUtils.

Warning: The forest of paths is faulty for graphs such as:

        digraph graph_47
        {
                big ->
                {
                        small
                        smaller
                        smallest
                }
        }

The result will be:

        Edges:
        root. Edge attrs: {}
           |---big. Edge attrs: {}

See also "nodes()", "style()" and "type()".

generate_parsed_file($file_name)

Returns nothing.

Outputs the CSV file of parsed items, if new() was called as new(parsed_file => $string).

grammar()

Returns the Marpa::R2::Recognizer object.

Called by "run()".

hashref2string($h)

Returns a string representation of the hashref.

increment_item_count()

Returns the next value of the internal item counter.

items()

Returns an arrayref of parsed tokens. Each element of this arrayref is a hashref. See "How is the parsed graph stored in RAM?" for details.

These parsed tokens do not bear a one-to-one relationship to the lexed tokens returned by the lexer's "GraphViz2::Marpa::Lexer" in items() method. However, they are (necessarily) very similar.

If you provide an output file by using the 'parsed_file' option to "new()", or the "parsed_file()" method, the file will have 2 columns, type and value.

E.g.: If the arrayref looks like:

        ...
        {count => 10, name => '', type => 'start_attribute', value => '['},
        {count => 11, name => '', type => 'attribute_id'   , value => 'color'},
        {count => 13, name => '', type => 'attribute_value', value => 'red'},
        {count => 14, name => '', type => 'end_attribute'  , value => ']'},
        ...

then the output file will look like:

        "type","value"
        ...
        start_attribute , "["
        attribute_id    , "color"
        attribute_value , "red"
        end_attribute   , "]"
        ...

Usage:

        my($parser) = GraphViz2::Marpa::Parser -> new(...);

        # $parser -> items actually returns an object of type Set::Array.

        if ($parser -> run == 0)
        {
                my(@items) = @{$parser -> items};
        }

lexed_file([$lex_file_name])

Here, the [] indicate an optional parameter.

Get or set the name of the CSV file of lexed tokens to read. This file can be output by the lexer.

The value supplied by the 'tokens' option takes preference over the 'lexed_file' option.

'lexed_file' is a parameter to "new()". See "Constructor and Initialization" for details.

log($level, $s)

Logs the given string $s at the given log level $level.

For levels, see Log::Handler::Levels.

logger([$logger_object])

Here, the [] indicate an optional parameter.

Get or set the logger object.

To disable logging, just set 'logger' to the empty string, in the call to "new()".

This logger is passed to the default renderer.

'logger' is a parameter to "new()". See "Constructor and Initialization" for details.

maxlevel([$string])

Here, the [] indicate an optional parameter.

Get or set the value used by the logger object.

This option is only used if GraphViz2::Marpa:::Lexer or GraphViz2::Marpa::Parser use or create an object of type Log::Handler. See Log::Handler::Levels.

'maxlevel' is a parameter to "new()". See "Constructor and Initialization" for details.

minlevel([$string])

Here, the [] indicate an optional parameter.

Get or set the value used by the logger object.

This option is only used if GraphViz2::Marpa:::Lexer or GraphViz2::Marpa::Parser use or create an object of type Log::Handler. See Log::Handler::Levels.

'minlevel' is a parameter to "new()". See "Constructor and Initialization" for details.

new()

Returns a object of type GraphViz2::Marpa::Parser.

See "Constructor and Initialization" for details on the parameters accepted by "new()".

new_item($type, $value)

Adds a new item to the internal list of parsed items.

At the end of the run, call "items()" to retrieve this list.

nodes()

Returns a hashref of all nodes, keyed by node name, with the value of each entry being a hashref of node-specific data. The keys to this hashref are:

o attributes

These attributes include those specified at the class level, with (from data/55.gv):

        node [shape = house]

And those specified for nodes with explicitly defined attributes:

        A [color = blue]

But, be warned, Graphviz does not apply class-level attributes to nodes with explicitly declared attributes, but only to nodes defined with no attributes, or declared implicitly by appearing in the declaration of an edge:

        C
        ...
        H -> I

See fixed just below.

The graph of data/55.gv then, is expected to have just these 3 nodes in the shape of houses.

So, if you call new() as new(report_forest => 1) on data/55.gv, the output will include:

        Nodes:
        A. Attr: {}
        B. Attr: {fillcolor => "goldenrod", shape => "square", style => "filled"}
        C. Attr: {shape => "house"}
        D. Attr: {fillcolor => "turquoise4", shape => "circle", style => "filled"}
        E. Attr: {fillcolor => "turquoise4", shape => "circle", style => "filled"}
        F. Attr: {fillcolor => "yellow", shape => "hexagon", style => "filled"}
        G. Attr: {fillcolor => "darkorchid", shape => "pentagon", style => "filled"}
        H. Attr: {fillcolor => "lightblue", fontsize => "20", shape => "house", style => "filled"}
        I. Attr: {fillcolor => "lightblue", fontsize => "20", shape => "house", style => "filled"}
        J. Attr: {fillcolor => "magenta", fontsize => "26", shape => "square", style => "filled"}
        K. Attr: {fillcolor => "magenta", fontsize => "26", shape => "triangle", style => "filled"}
o fixed

This is a Boolean which records whether or not Graphviz will apply class-level attributes to nodes.

See also "edges()", "style()" and "type()".

output_file([$file_name])

Here, the [] indicate an optional parameter.

Get or set the name of the file to be passed to the renderer.

'output_file' is a parameter to "new()". See "Constructor and Initialization" for details.

parsed_file([$file_name])

Here, the [] indicate an optional parameter.

Get or set the name of the file of parsed tokens for the parser to write. This file can be input to the renderer.

'parsed_file' is a parameter to "new()". See "Constructor and Initialization" for details.

paths()

Reserved.

See also "edges()", "nodes()", "style()" and "type()".

Calls "tree2string([$edges])" for $self -> edges.

Called by "run()" at the end of the run, if new() was called as new(report_forest => 1).

Logs all details stored in the getters "edges()", "nodes()", "style()" and "type()".

renderer([$renderer_object])

Here, the [] indicate an optional parameter.

Get or set the renderer object.

This renderer renders the tokens output by the parser.

'renderer' is a parameter to "new()". See "Constructor and Initialization" for details.

report()

Called by "run()".

Logs the list of parsed items if new() was called as new(report_items => 1).

report_forest([$Boolean])

The [] indicate an optional parameter.

Get or set the value which determines whether or not to log the forest of paths recognised by the parser.

'report_forest' is a parameter to "new()". See "Constructor and Initialization" for details.

report_items([$Boolean])

The [] indicate an optional parameter.

Get or set the value which determines whether or not to log the items recognised by the parser.

'report_items' is a parameter to "new()". See "Constructor and Initialization" for details.

run()

Returns 0 for success and 1 for failure.

This is the only method the caller needs to call. All parameters are supplied to "new()" (or other methods).

At the end of the run, you can call any or all of these:

"edges()", "items()", "nodes()", "style()" and "type()".

If you called new() without setting any report options, you could also call:

"print_structure()" and "report()".

style()

Returns a hashref of attributes used to style the rendered graph:

So, if you call new() as new(report_forest => 1) on data/55.gv, the output will include:

        Style:
        {label => "Complex Syntax Test", rankdir => "TB"}

See also "edges()", "nodes()" and "type()".

tokens([$arrayrefOfLexedTokens])

Here, the [] indicate an optional parameter.

Get or set the arrayref of lexed tokens to process.

The value supplied by the 'tokens' option takes preference over the 'lexed_file' option.

'tokens' is a parameter to "new()". See "Constructor and Initialization" for details.

tree2string([$edges])

Here, the [] indicate an optional parameter.

If $edges is not supplied, it defaults to $self -> edges.

Returns an arrayref which can be printed with:

        print map{"$_\n"} @{$self -> tree2string};

Calls "Tree::DAG_Node/tree2string([$options], [$some_tree])".

Only override this in a sub-class if you wish to log the forest in a different format.

type()

Returns a hashref of attributes describing what type of graph it is.

So, if you call new() as new(report_forest => 1) on data/55.gv, the output will include:

        Type:
        {digraph => "1", graph_id => "graph_55", strict => "1"}

This hashref always has the same 3 keys.

See also "edges()", "nodes()" and "style()".

utils([$aUtilsObject])

Here, the [] indicate an optional parameter.

Get or set the utils object.

Default: A object of type GraphViz2::Marpa::Utils.

FAQ

Are the certain cases I should watch out for?

Yes. Consider these 3 situations and their corresponding lexed or parsed output:

o digraph g {...}
        digraph     , "yes"
        graph_id    , "g"
        start_scope , "1"
o The start_scope count must be 1 because it's at the very start of the graph
o subgraph s {...}
        start_subgraph , "1"
        graph_id       , "s"
        start_scope    , "2"
o The start_scope count must be 2 or more
o When start_scope is preceeded by graph_id, it's a subgraph
o Given 'subgraph {...}', the graph_id will be ""
o {...}
        start_scope , "2"
o The start_scope count must be 2 or more
o When start_scope is not preceeded by graph_id, it's a stand-alone {...}

Do the getters edges(), nodes(), style() and type() duplicate all of the input file's data?

No. In particular, subgraph info is still missing.

Why doesn't the lexer/parser handle my HTML-style labels?

Traps for young players:

o The <br /> component must include the '/'. <br align='center'> is not accepted by Graphviz
o The <br />'s attributes must use single quotes because output files use CSV with double quotes

See data/38.* for good examples.

How can I switch from Marpa::XS to Marpa::PP?

Install Marpa::PP manually. It is not mentioned in Build.PL or Makefile.PL.

Patch GraphViz2::Marpa::Parser (line 15) from Marpa::XS to Marpa:PP.

Then, run the tests which ship with this module. I've tried this, and the tests all worked. You don't need to install the code to test it. Just use:

        shell> cd GraphViz2-Marpa-1.00/
        shell> prove -Ilib -v t

Where are the scripts documented?

In "Scripts" in GraphViz2::Marpa.

How is the parsed graph stored in RAM?

Items are stored in an arrayref. This arrayref is available via the "items()" method.

These items have the same format as the arrayref of items returned by the items() method in GraphViz2::Marpa::Lexer, and the same as in GraphViz2::Marpa::Lexer::DFA.

However, the precise values in the 'type' field of the following hashref vary between the lexer and the parser.

Each element in the array is a hashref:

        {
                count => $integer, # 1 .. N.
                name  => '',       # Unused.
                type  => $string,  # The type of the token.
                value => $value,   # The value from the input stream.
        }

$type => $value pairs used by the parser are listed here in alphabetical order by $type:

o attribute_id => $id
o attribute_value => $value
o class_id => /^edge|graph|node$/

This represents 3 special tokens where the author of the dot file used one or more of the 3 words edge, graph, or node, to specify attributes which apply to all such cases. So:

        node [shape = Msquare]

means all nodes after this point in the input stream default to having a square shape. Of course this can be overidden by another such line, or by any specific node having a shape as part of its list of attributes.

See data/51.* for sample code.

o colon => ':'

This separates nodes from ports and ports from compass points.

o compass_point => $id
o digraph => $yes_no

'yes' => digraph and 'no' => graph.

o edge_id => $id

$id is either '->' for a digraph or '--' for a graph.

o end_attribute => ']'

This indicates the end of a set of attributes.

o end_scope => $brace_count

This indicates the end of a graph or subgraph or any stand-alone {}, and - for subgraphs - preceeds the subgraph's 'end_subgraph'.

$brace_count increments by 1 each time 'graph_id' is detected in the input string, and decrements each time a matching 'end_scope' is detected.

o end_subgraph => $subgraph_count

This indicates the end of a subgraph, and follows the subgraph's 'end_scope'.

$subgraph_count increments by 1 each time 'start_subgraph' is detected in the input string, and decrements each time a matching 'end_subgraph' is detected.

o graph_id => $id

This indicates both the graph's $id and each subgraph's $id.

For graphs and subgraphs, the $id may be '' (the empty string).

o node_id => $id
o port_id => $id
o start_attribute => '['

This indicates the start of a set of attributes.

o start_scope => $brace_count

This indicates the start of the graph, a subgraph, or any stand-alone {}.

$brace_count increments by 1 each time 'graph_id' is detected in the input string, and decrements each time a matching 'end_scope' is detected.

o start_subgraph => $subgraph_count

This indicates the start of a subgraph, and preceeds the subgraph's 'graph_id'.

$subgraph_count increments by 1 each time 'start_subgraph' is detected in the input string, and decrements each time a matching 'end_subgraph' is detected

o strict => $yes_no

'yes' => strict and 'no' => not strict.

Consult data/*.lex and the corresponding data/*.parse for many examples.

How does the parser handle comments?

Comments are not expected in the input stream.

How does the parser interact with Marpa?

See http://savage.net.au/Perl-modules/html/graphviz2.marpa/Lexing.and.Parsing.with.Marpa.html.

This module uses Hash::FieldHash, which has an XS component!

Correct. My policy is that stand-alone modules should use a light-weight object manager (my choice is Hash::FieldHash), whereas apps can - and probably should - use Moose.

Machine-Readable Change Log

The file CHANGES was converted into Changelog.ini by Module::Metadata::Changes.

Version Numbers

Version numbers < 1.00 represent development versions. From 1.00 up, they are production versions.

Support

Email the author, or log a bug on RT:

https://rt.cpan.org/Public/Dist/Display.html?Name=GraphViz2::Marpa.

Author

GraphViz2::Marpa was written by Ron Savage <ron@savage.net.au> in 2012.

Home page: http://savage.net.au/index.html.

Copyright

Australian copyright (c) 2012, Ron Savage.

        All Programs of mine are 'OSI Certified Open Source Software';
        you can redistribute them and/or modify them under the terms of
        The Artistic License, a copy of which is available at:
        http://www.opensource.org/licenses/index.html