Dot::Parser
This module reads/parses DOT files.
use Dot::Parser qw(parse_dot); my $graph = parse_dot("dotfile.dot");
Dot::Parser only exports one function: parse_dot(). This function takes a string with the name of a file written in DOT format and returns a hash reference that contains an adjacency list of the graph.
%hash => {node1} => {node2} => undef {node3} => undef {node4} => undef {node2} => {node5} => undef ...
Dot::Parser uses two types of functions: STATES and REGULAR functions.
This function slurps the dotfile. It does some ugly modifications to it to help the parser. these are adding spaces between edgeops and removing spaces between '=' signs. Node ids with this symbols will be (obviously) altered, but I think it's not a big deal to remove one or two spaces. Maybe in the future I could improve this.
This function simply adds the node stored in the buffer to the node stack and to the graph adjacency list. Then it removes everything in the buffer
This is the initial state. It lasts until the parser reads a graph declaration. If your file is not a dot file, you may end up forever in this state.
This defines the state in which the Parser has read a keyword. It could be either a graph declaration (disubgraph) or a graph attribute declaration
This is the "normal" state of the parser. Here, everything that looks normal [A-Za-z0-9_]+ will be considered a node ID if it is not a keyword. Thus, it is important to define the cases in which the Parser has to "get out" of this state (for example, in the case of rank=id or an [attribute]).
Right now it works but it is very messy. It certainly needs a refactor.
The parser read an edgeops -> and falls into this state. Here, it will look for another node ID, and then it will return to the "normal/inside" state. The "ending" of a node ID could be a whitespace, a semicolon, an attribute statement, a comment or a multicomment If the second node is quoted, this state is not enough, so the parser will fall into the state quoted_edge, which can deal with special characters and symbols
This is a quoted node state, which is straightforward. If the parser (in state inside) reads an opening double quote, it falls into this state. The parser will add everything to the buffer, and once it gets to a closing double quote it will save the buffer to the nodes stack
This is the state that allows special characters and symbols in the second node in an edge statement. It's similar to the state edge, but it will only end reading the node ID when it gets to another double quote.
This is an attribute statement. Everything will be discarded until the parser reads a closing square bracket.
This funny name represents the state in which the parser reads something like rank=id It's not a very well defined state and it may need further improvements.
This is the state of a normal comment //comment that will end with a newline character
This special comment /* comment */ ignores newline characters. It will only end when it gets to a closing comment symbol */
To install Dot::Parser, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Dot::Parser
CPAN shell
perl -MCPAN -e shell install Dot::Parser
For more information on module installation, please visit the detailed CPAN module installation guide.