Getting Over My Fear of Custom Tools

We are lucky to have great development tools for our most common programming tasks. But every project has its unique challenges — what do you do when your project could benefit from automating some piece of work but there is no good tool available? Most teams I’ve worked with use scripting in their build processes and write functions to add copyrights or update timestamps. But some teams don’t go far beyond that — perhaps because custom tooling isn’t considered when approaching a problem or because of a perception that writing custom tooling is too time-consuming to be a common practice.

A Solid Base

I had the chance to write my first code generator recently. I have to say, it was an embarrassing experience — not because it was challenging, but because it was so easy. I’d been avoiding something that was fun, surprisingly simple, and certainly valuable.

My particular problem was this: I wanted to generate a number of tests in SQL that were going to share common behavior. To keep these tests very close to the code they were exercising, I wanted to include them in comments alongside the SQL queries that they would verify. Think of it somewhat like defining tests via annotation. Here’s a quick example of what these tests might look like:

/*** @test
  <name>A name for this test</name>
  <var name="result" type="NUMBER"/>
  <statement>select case when count(*) > 0 then 1 else 0 end into result 
    from some_table;
  <name>Second test in the same comment block</name>
  <var name="someOtherVar" type="NUMBER"></var>
  <statement>select count(*) > 0 into someOtherVar from some_table;</statement>

I had a great head start on making this happen. My team had already scripted general purpose tooling that walked our source tree with nice hooks to visit each file in a directory and run callbacks before and after processing a directory’s children. This framework was being used for a number of different tasks during the build, including validating the source files, reporting on included or excluded content, and creating master scripts that included selected code from across the source tree. You can imagine any number of uses — the copyright and timestamp cases mentioned above, scanning for unused resources, undesirable comments, or leaked debugging statements perhaps.

A Simple Code Generator

For my use case, I needed to:

  • Locate the comment blocks that included my test code.
  • Parse the test information out of those blocks.
  • Generate something useful from that information, and gather the tests into a single file.

Locating the test blocks was easy enough. I defined a custom opening and closing format for the comment blocks and wrote a regex that would find these blocks and return the enclosed content.

  COMMENT_PATTERN = /\/\*{3} @test(([^*]|[\r\n]|(\*+([^*\/]|[\r\n])))*)\*{3}\//
  def visit_file path
    path.read.scan(COMMENT_PATTERN) { |matchData|
      @xml_blocks << "<tests>" + matchData[0] + "</tests>"

My test content was written as XML, so it was trivial to retrieve the information I needed. I ran the xml blocks through a parser and grabbed the nodes I was interested in from the tree. And then it was just a matter of generating test content out of those values for the current directory’s files, then bubbling those statements up to the top of the source tree.

  def post_walk children
    local.open("w") { |io|
      @xml_blocks.each {|xml|
        doc = REXML::Document.new(xml)
          testName = test.get_elements("name").first.text
          io.puts "Some generated code of some sort for the test" + testName;
          test.get_elements("var").each {|var|
            io.puts "Some generated code for each node of a given type."
    (root + "_tests_all.sql").open("w") {|io| 
      subsql = dirs.map {|d| d + "_tests_all.sql"}.select(&:exist?)
      io.puts join_sql(local, *subsql) 

So, after a few hours, we now have a nice way to write tests with common error handling and reporting. They also share the logic to insert results into our test table.

Lessons Learned

The biggest lesson I learned, of course, was that custom tooling is not to be feared. Parsing source files for interesting information is relatively simple, and either reporting on that information or using it to generate or modify artifacts can open up a lot of interesting use cases.

I also learned that even when writing custom tooling, it pays to pick your tools carefully! My experience prior to this project mostly involved scripting with ANT and Java. While ANT certainly has its place, I found Rake and Ruby to be much faster and more enjoyable to work with. Custom tools need to pay for themselves over the course of your project, so it’s key to lower barriers when creating them.

Finally – I’m late on the bandwagon for looking at custom code generation as a way to keep things DRY. In some cases, code generation is a viable alternative to class composition to avoid duplication. In other cases, it may be your only option.

My particular use case may not be one that applies to your project, but I hope it shows how simple it can be to run tasks across your source tree and gives you ideas for tools of your own.