GraphViz2::Marpa::Lexer - A Perl lexer for Graphviz dot files. Output goes to GraphViz2::Marpa::Parser.
perl scripts/lex.pl -h perl scripts/parse.pl -h perl scripts/g2m.pl -h
perl scripts/lex.pl -input_file x.gv -lexed_file x.lex x.gv is a Graphviz dot file. x.lex will be a CSV file of lexed tokens.
perl scripts/parse.pl -lexed_file x.lex -parsed_file x.parse x.parse will be a CSV file of parsed tokens.
perl scripts/parse.pl -lexed_file x.lex -parsed_file x.parse -output_file x.rend x.rend will be a Graphviz dot file.
perl scripts/g2m.pl -input_file x.gv -lexed_file x.lex -parsed_file x.parse -output_file x.rend
GraphViz2::Marpa::Lexer provides a Set:FA::Element-based lexer for http://www.graphviz.org/ dot files.
The output is intended to be input into GraphViz2::Marpa::Parser.
Demo lexer/parser output: http://savage.net.au/Perl-modules/html/graphviz2.marpa/index.html.
State Transition Table: http://savage.net.au/Perl-modules/html/graphviz2.marpa/default.stt.html.
Command line options and object attributes: http://savage.net.au/Perl-modules/html/graphviz2.marpa/code.attributes.html.
My article on this set of modules: http://www.perl.com/pub/2012/10/an-overview-of-lexing-and-parsing.html.
The Marpa grammar as an image: http://savage.net.au/Ron/html/graphviz2.marpa/Marpa.Grammar.svg. This image was created with Graphviz via GraphViz2.
Install GraphViz2::Marpa as you would for any Perl module:
Perl
Run:
cpanm GraphViz2::Marpa
or run:
sudo cpan GraphViz2::Marpa
or unpack the distro, and then either:
perl Build.PL ./Build ./Build test sudo ./Build install
or:
perl Makefile.PL make (or dmake or nmake) make test make install
new() is called as my($lexer) = GraphViz2::Marpa::Lexer -> new(k1 => v1, k2 => v2, ...).
new()
my($lexer) = GraphViz2::Marpa::Lexer -> new(k1 => v1, k2 => v2, ...)
It returns a new object of type GraphViz2::Marpa::Lexer.
GraphViz2::Marpa::Lexer
Key-value pairs accepted in the parameter list (see corresponding methods for details [e.g. description([$graph])]):
Read the Graphviz (dot) graph definition from the command line.
You are strongly encouraged to surround this string with '...' to protect it from your shell.
See also the 'input_file' option to read the description from a file.
The 'description' option takes precedence over the 'input_file' option.
Default: ''.
Read the Graphviz (dot) graph definition from a file.
See also the 'description' option to read the graph definition from the command line.
See the distro for data/*.gv.
Specify the name of a CSV file of lexed tokens to write. This file can be input to the parser.
The default means the file is not written.
See the distro for data/*.lex.
Specify a logger compatible with Log::Handler, for the lexer to use.
Default: A logger of type Log::Handler which writes to the screen.
To disable logging, just set 'logger' to the empty string (not undef).
This option affects Log::Handler.
See the Log::Handler::Levels docs.
Default: 'notice'.
Default: 'error'.
No lower levels are used.
Log the items recognised by the lexer.
Default: 0.
Log the State Transition Table.
Calls "report()" in Set::FA::Element. Set min and max log levels to 'info' for this.
Specify which file contains the State Transition Table.
The default value means the STT is read from the source code of GraphViz2::Marpa::Lexer.
Candidate files are '' and 'data/default.stt.csv'.
The type of this file must be specified by the 'type' option.
If the file name matches /csv$/, the value of the 'type' option is set to 'csv'.
Run the DFA for at most this many seconds.
Default: 10.
Specify the type of the stt_file: '' for internal STT and 'csv' for CSV.
This option must be used with the 'stt_file' option.
Warning: The 'ods' option is disabled, because I can find no way in LibreOffice to make it operate in ASCII. What happens is that when you type " (i.e. the double-quote character on the keyboard), LibreOffice inserts a different double-quote character, which, when exported as CSV in Unicode format, produces these 3 bytes: 0xe2, 0x80, 0x9c. This means that if you edit the STT, you absolutely must export to a CSV file in ASCII format. It also means that dot identifiers in (normal) double-quotes will never match the double-quotes in the *.ods file.
The [] indicate an optional parameter.
Get or set the Graphviz (dot) graph definition.
The value supplied by the 'description' option takes precedence over the value read from the 'input_file'.
See also "input_file()".
'description' is a parameter to "new()". See "Constructor and Initialization" for details.
Write the lexed tokens to the named file.
Called as needed by run().
If the caller has requested a graph be parsed from the command line, with the 'description' option to "new()", get it now.
Called as appropriate by "run()".
If the caller has requested a graph be parsed from a file, with the 'input_file' option to "new()", get it now.
Get or set the value of the Graphviz (dot) graph definition string.
Called by "get_graph_from_command_line()" and "get_graph_from_file()".
Here, the [] indicate an optional parameter.
Get or set the name of the file to read the Graphviz (dot) graph definition from.
See also the "description()" method.
'input_file' is a parameter to "new()". See "Constructor and Initialization" for details.
Returns an arrayref of lexed tokens. Each element of this arrayref is a hashref.
These lexed tokens do not bear a one-to-one relationship to the parsed tokens returned by the parser's "GraphViz2::Marpa::Parser" in items() method. However, they are (necessarily) very similar.
If you provide an output file by using the 'lexed_file' option to "new()", or the "lexed_file()" method, the file will have 2 columns, type and value.
E.g.: If the arrayref looks like:
... {count => 10, name => '', type => 'open_bracket' , value => '['}, {count => 11, name => '', type => 'attribute_id' , value => 'color'}, {count => 12, name => '', type => 'equals' , value => '='}, {count => 13, name => '', type => 'attribute_value', value => 'red'}, {count => 14, name => '', type => 'right_bracket' , value => ']'}, ...
then the output file will look like:
"type","value" ... open_bracket , "[" attribute_id , "color" equals , "=" attribute_value , "red" close_bracket , "]" ...
If you look at the source code for the run() method in GraphViz2::Marpa, you'll see this arrayref can be passed directly as the value of the 'tokens' key in the call to GraphViz2::Marpa::Parser's new().
Usage:
my($lexer) = GraphViz2::Marpa::Lexer -> new(...); # $lexer -> items actually returns an object of type Set::Array. if ($lexer -> run == 0) { my(@items) = @{$lexer -> items}; }
See also "How is the lexed graph stored in RAM?" in the "FAQ" below. And see any data/*.lex file for sample data.
And now for a real graph:
Input: data/15.gv:
digraph graph_15 { node [ shape = "record" ] edge [ color = "red" penwidth = 5 ] node_15_1 [ label = "<f0> left|<f1> middle|<f2> right" ] node_15_2 [ label = "<f0> one|<f1> two" ] node_15_1:f0 -> node_15_2:f1 [ arrowhead = "obox" ] }
Output: data/15.lex:
"type","value" strict , "no" digraph , "yes" graph_id , "graph_15" start_scope , "1" class_id , "node" open_bracket , "[" attribute_id , "shape" equals , "=" attribute_value , "record" close_bracket , "]" class_id , "edge" open_bracket , "[" attribute_id , "color" equals , "=" attribute_value , "red" attribute_id , "penwidth" equals , "=" attribute_value , "5" close_bracket , "]" node_id , "node_15_1" open_bracket , "[" attribute_id , "label" equals , "=" attribute_value , "<f0> left|<f1> middle|<f2> right" close_bracket , "]" node_id , "node_15_2" open_bracket , "[" attribute_id , "label" equals , "=" attribute_value , "<f0> one|<f1> two" close_bracket , "]" node_id , "node_15_1" open_bracket , "[" attribute_id , "port_id" equals , "=" attribute_value , "f0" close_bracket , "]" edge_id , "->" node_id , "node_15_2" open_bracket , "[" attribute_id , "port_id" equals , "=" attribute_value , "f1" attribute_id , "arrowhead" equals , "=" attribute_value , "obox" close_bracket , "]" end_scope , "1"
Note the pair:
open_bracket , "[" ... close_bracket , "]"
They start and end each set of attributes, which are of 3 types:
Node attributes can be specified both at the class (all subsequent nodes) level, or for a specific node.
Class:
node [ shape = "record" # Attribute. ]
Node:
node_15_1 [ label = "<f0> left|<f1> middle|<f2> right" # Attribute. ]
Edge:
node_15_1:f0 -> node_15_2:f1 # Attributes. [ arrowhead = "obox" ]
Edge attributes can be specified both at the class level and after the second of 2 nodes on an edge.
edge [ color = "red" # Attribute. penwidth = 5 # Attribute. ]
and
node_15_1:f0 -> node_15_2:f1 [ arrowhead = "obox" # Attribute. ]
These only ever occur for one or both of the 2 nodes on an edge, i.e. not at the class or node level:
Get or set the name of the CSV file of lexed tokens to write. This file can be input to the parser.
'lexed_file' is a parameter to "new()". See "Constructor and Initialization" for details.
Calls $self -> logger -> $level($s) if ($self -> logger).
Get or set the logger object.
To disable logging, just set 'logger' to the empty string (not undef), in the call to "new()".
This logger is passed to GraphViz2::Marpa::Lexer::DFA.
'logger' is a parameter to "new()". See "Constructor and Initialization" for details.
Get or set the value used by the logger object.
This option is only used if GraphViz2::Marpa:::Lexer or GraphViz2::Marpa::Parser use or create an object of type Log::Handler. See Log::Handler::Levels.
'maxlevel' is a parameter to "new()". See "Constructor and Initialization" for details.
'minlevel' is a parameter to "new()". See "Constructor and Initialization" for details.
See "Constructor and Initialization" for details on the parameters accepted by "new()".
Log the list of items recognized by the DFA.
Get or set the value which determines whether or not to log the items recognised by the lexer.
'report_items' is a parameter to "new()". See "Constructor and Initialization" for details.
Get or set the value which determines whether or not to log the parsed state transition table (STT).
'report_stt' is a parameter to "new()". See "Constructor and Initialization" for details.
This is the only method the caller needs to call. All parameters are supplied to "new()" (or other methods).
Returns 0 for success and 1 for failure.
Get or set the name of the file containing the State Transition Table.
This option is used in conjunction with the 'type' option to "new()".
'stt_file' is a parameter to "new()". See "Constructor and Initialization" for details.
Get or set the timeout for how long to run the DFA.
'timeout' is a parameter to "new()". See "Constructor and Initialization" for details.
Get or set the value which determines what type of 'stt_file' is read.
'type' is a parameter to "new()". See "Constructor and Initialization" for details.
Get or set the utils object.
Default: A object of type GraphViz2::Marpa::Utils.
Yes. Consider these 3 situations and their corresponding lexed output:
digraph , "yes" graph_id , "g" start_scope , "1"
start_subgraph , "1" graph_id , "s" start_scope , "2"
start_scope , "2"
Traps for young players:
See data/38.* for good examples.
In "Scripts" in GraphViz2::Marpa.
I use data/default.stt.ods via LibreOffice, when editing the STT.
Then, I export it to data/default.stt.csv. This file is incorporated into the source code of Lexer.pm, after the __DATA__ token.
Lastly, I run scripts/stt2html.pl, and output the result to html/default.stt.html.
So I ship 3 representations of the STT in the distro.
When the lexer runs, the 'stt_file' and 'type' options to "new()" default to reading the STT - using Data::Section::Simple's function get_data_section() - directly from __DATA__.
In GraphViz2::Marpa::Lexer::DFA.
Items are stored in an arrayref. This arrayref is available via the "items()" method, which also has a long explanation of this subject.
These items have the same format as the arrayref of items returned by the items() method in GraphViz2::Marpa::Parser, and the same as in GraphViz2::Marpa::Lexer::DFA.
However, the precise values in the 'type' field of the following hashref vary between the lexer and the parser.
Each element in the array is a hashref:
{ count => $integer, # 1 .. N. name => '', # Unused. type => $string, # The type of the token. value => $value, # The value from the input stream. }
$type => $value pairs used by the lexer are listed here in alphabetical order by $type:
This represents 3 special tokens where the author of the dot file used one or more of the 3 words edge, graph, or node, to specify attributes which apply to all such cases. So:
node [shape = Msquare]
means all nodes after this point in the input stream default to having an Msquare shape. Of course this can be overidden by another such line, or by any specific node having a shape as part of its list of attributes.
See data/51.* for sample code.
This indicates the end of a set of attributes.
'yes' => digraph and 'no' => graph.
$id is either '->' for a digraph or '--' for a graph.
This indicates the end of the graph, the end of a subgraph, or the end of a stand-alone {...}.
$brace_count increments by 1 each time '{' is detected in the input string, and decrements each time '}' is detected.
This indicates the end of a subgraph, and follows the subgraph's 'end_scope'.
$subgraph_count increments by 1 each time 'subgraph' is detected in the input string, and decrements each time a matching '}' is detected.
This separates 'attribute_id' from 'attribute_value'.
The parser does not output this token.
This indicates both the graph's $id and each subgraph's $id.
For graphs and subgraphs, the $id may be '' (the empty string), and in a case such as:
{ rank = same A B }
The $id will definitely be ''.
See data/18.gv, data/19.gv, data/53.gv and data/55.gv.
This indicates the start of the graph, the start of a subgraph, or the start of a stand-alone {...}.
This indicates the start of a set of attributes.
This indicates the start of a subgraph, and preceeds the subgraph's 'graph_id'.
'yes' => strict and 'no' => not strict.
Consult data/*.gv and the corresponding data/*.lex for many examples.
See the next point.
That is, Bash (Perl) and C++-style line-oriented comments are recognized, and the whole line is discarded.
This happens when the line is read in from a file, and so does not apply to the 'description' parameter to new().
This is, C-style comments are recognized, and the comment is discaded.
This happens via the STT, and so applies to any source of input.
But, no attempt is made to ensure the '/*' and '*/' are not embedded in otherwise non-comment strings, so don't do that.
Simply that Bash and C++-style comments appearing on the ends of lines containing dot commands are not handled. So, don't do that ether.
This means that no output file, e.g. *.lex, *.parse or *.rend, will ever retain comments from the input *.gv file.
Perhaps. Perfection is an extra-cost option... The cost is unknown, but huge donations are welcome.
Actually, according to DOT's HTML-like label definition, http://www.graphviz.org/content/node-shapes#html you can use <...> instead of "..." to delimit text labels. The lexer as of V 1.02 does not handle this case. That is, the code only recognizes HTML-like labels which are delimited with '<<' and '>>'.
The file CHANGES was converted into Changelog.ini by Module::Metadata::Changes.
Version numbers < 1.00 represent development versions. From 1.00 up, they are production versions.
Email the author, or log a bug on RT:
https://rt.cpan.org/Public/Dist/Display.html?Name=GraphViz2::Marpa.
GraphViz2::Marpa was written by Ron Savage <ron@savage.net.au> in 2012.
Home page: http://savage.net.au/index.html.
Australian copyright (c) 2012, Ron Savage.
All Programs of mine are 'OSI Certified Open Source Software'; you can redistribute them and/or modify them under the terms of The Artistic License, a copy of which is available at: http://www.opensource.org/licenses/index.html
To install GraphViz2::Marpa, copy and paste the appropriate command in to your terminal.
cpanm
CPAN shell
perl -MCPAN -e shell install GraphViz2::Marpa
For more information on module installation, please visit the detailed CPAN module installation guide.