The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Dot::Parser

DESCRIPTION

This module reads/parses DOT files.

SYNOPSIS

    use Dot::Parser qw(parse_dot); 
    my $graph = parse_dot("dotfile.dot");

METHODS

parse_dot

Dot::Parser only exports one function: parse_dot(). This function takes a string with the name of a file written in DOT format and returns a hash reference that contains an adjacency list of the graph.

    %hash => 
        {node1} => {node2} => undef
                   {node3} => undef
                   {node4} => undef

        {node2} => {node5} => undef
        ...

INTERNAL FUNCTIONS

Dot::Parser uses two types of functions: STATES and REGULAR functions.

REGULAR FUNCTIONS

_slurp

This function slurps the dotfile. It does some ugly modifications to it to help the parser. these are adding spaces between edgeops and removing spaces between '=' signs. Node ids with this symbols will be (obviously) altered, but I think it's not a big deal to remove one or two spaces. Maybe in the future I could improve this.

_add_node

This function simply adds the node stored in the buffer to the node stack and to the graph adjacency list. Then it removes everything in the buffer

STATES

_state_none

This is the initial state. It lasts until the parser reads a graph declaration. If your file is not a dot file, you may end up forever in this state.

_state_init

This defines the state in which the Parser has read a keyword. It could be either a graph declaration (disubgraph) or a graph attribute declaration

_state_inside

This is the "normal" state of the parser. Here, everything that looks normal [A-Za-z0-9_]+ will be considered a node ID if it is not a keyword. Thus, it is important to define the cases in which the Parser has to "get out" of this state (for example, in the case of rank=id or an [attribute]).

Right now it works but it is very messy. It certainly needs a refactor.

_state_edge

The parser read an edgeops -> and falls into this state. Here, it will look for another node ID, and then it will return to the "normal/inside" state. The "ending" of a node ID could be a whitespace, a semicolon, an attribute statement, a comment or a multicomment If the second node is quoted, this state is not enough, so the parser will fall into the state quoted_edge, which can deal with special characters and symbols

_state_quoted_node

This is a quoted node state, which is straightforward. If the parser (in state inside) reads an opening double quote, it falls into this state. The parser will add everything to the buffer, and once it gets to a closing double quote it will save the buffer to the nodes stack

_state_quoted_edge

This is the state that allows special characters and symbols in the second node in an edge statement. It's similar to the state edge, but it will only end reading the node ID when it gets to another double quote.

_state_attribute

This is an attribute statement. Everything will be discarded until the parser reads a closing square bracket.

_state_ass_attribute

This funny name represents the state in which the parser reads something like rank=id It's not a very well defined state and it may need further improvements.

_state_comment

This is the state of a normal comment //comment that will end with a newline character

_state_multicomment

This special comment /* comment */ ignores newline characters. It will only end when it gets to a closing comment symbol */