Making Diagrams with graphviz

graphviz is a great tool for creating quick diagrams. While it does not have a particularly polished WYSIWYG editor such as omnigraffle, it can automatically create diagrams from its simple markup language, DOT. After reading in the DOT markup, it uses various layout algorithms to automatically arrange the diagram. The DOT language is pretty flexible in its formatting and quite easy to generate from other programs.

The main installation includes a couple of standalone programs, such as dot and neato. I’ve mostly used dot, which creates directed graphs. (neato renders undirected graphs.) While installing it is beyond the scope of this post, installation is straightforward on OSX, Windows, and most Linux and BSD distributions.

Creating a Basic Graph in graphviz

To create a basic graph, all you really need is this:

digraph {
    a -> b

That generates the graph shown at right. The nodes ‘a’ and ‘b’ are created automatically, and a directed arc is created between them. To create a PNG diagram, use the command dot -Tpng -o OUT.png (Graphviz can create many other formats as well.)

Many attributes can be added to the overall graph, individual nodes and the arcs between them. Here are some example properties

digraph {
    graph [rankdir=LR] // left-right layout, not top-down
    a [shape=square, fontcolor=white, style=filled, fillcolor=blue]
    b [shape=triangle, fontcolor=white, style=filled,
       fillcolor=red, peripheries=3]
    c [style=invis]
    a -> b [style=bold, color=red, label="to b"]
    b -> a [style=dashed, color=blue, label="to a"]
    a -> c [style=dotted, label="?"]

styled diagramWhile this can become a bit verbose, the formatting is very consistent, and DOT code can easily be generated. There are language-specific wrappers for creating objects that render to DOT code, but I find the DOT language to be easy enough to work with directly.

Using graphiz for Development

I find graphviz particularly useful for visualizing interdependencies between tasks, diagramming state machines, and debugging data structures.

Here’s DOT code for a task diagram about making a peanut butter and jelly sandwich:

digraph {
    peanut_butter_jelly_time -> make_sandwich
    make_sandwich -> jelly
    make_sandwich -> bread
    make_sandwich -> plate
    peanut_butter -> open_pb_jar
    peanut_butter -> spread_pb
    spread_pb -> clean_knife
    jelly -> open_jelly_jar
    jelly -> spread_jelly
    bread -> grocery_shopping
    grocery_shopping -> fix_bike_tire
    spread_jelly -> clean_knife
    open_jelly_jar -> unstick_jelly_jar
    unstick_jelly_jar -> use_rubber_band_to_grip
    make_sandwich -> peanut_butter
    clean_knife -> unload_dishwasher
    plate -> unload_dishwasher

tasks to make PB&J

While the order in which the nodes and arcs are specified can affect the overall arrangement, graphviz’s placement algorithms will always cause the nodes without dependencies to sink to the bottom, making it clear what can be acted on immediately.

Likewise, here’s DOT code for a state machine diagram:

digraph {
    online [peripheries=2]
    START -> offline
    offline -> scanning [label="scan"]
    offline -> offline [label="failed scan"]
    scanning -> joining [label="detected"]
    scanning -> offline [label="failed join"]
    joining -> online [label="handshake"]
    joining -> offline [label="failed handshake"]
    online -> offline [label="timeout"]

simple state machine

This could be easily generated from a list of states and transitions between them (with labels). ragel, a tool for generating code from state machine descriptions, also uses graphviz for visualizing its output.

Finally, graphviz is helpful for debugging data structures. I’ve found pointer errors in C data structures by writing code to dump out the structs’ addresses and pointers at runtime, making bad references stand out. Structs are best represented by the ‘record’ shape, which allows named sub-fields. Their label is broken up by ‘|’ characters and the fields are tagged with s, such as

n16 [label="||||16"];

The sub-fields can be referenced independently: HEAD:f3 -> n16:f3;

For a full example, here is a skiplist diagram I made for my Strange Loop talk.

skip list

Since DOT is a declarative format, if you want to steer it towards a particular layout, it’s best to work within its model. If you want a series of nodes to appear in a line (as the bottom of the skiplists nodes do), you can modify the weight attribute for the edges between them. This causes the layout algorithm to place higher priority on keeping those nodes as close as the overall layout will allow. Invisible edges and nodes (with style=invis) can also nudge the layout in the right direction.