We're hiring!

We're actively seeking designers and developers for all three of our locations.

Getting Over My Fear of Custom Tools

We are lucky to have great development tools for our most common programming tasks. But every project has its unique challenges — what do you do when your project could benefit from automating some piece of work but there is no good tool available? Most teams I’ve worked with use scripting in their build processes and write functions to add copyrights or update timestamps. But some teams don’t go far beyond that — perhaps because custom tooling isn’t considered when approaching a problem or because of a perception that writing custom tooling is too time-consuming to be a common practice.

A Solid Base

I had the chance to write my first code generator recently. I have to say, it was an embarrassing experience — not because it was challenging, but because it was so easy. I’d been avoiding something that was fun, surprisingly simple, and certainly valuable.

My particular problem was this: I wanted to generate a number of tests in SQL that were going to share common behavior. To keep these tests very close to the code they were exercising, I wanted to include them in comments alongside the SQL queries that they would verify. Think of it somewhat like defining tests via annotation. Here’s a quick example of what these tests might look like:

/*** @test
<test>
  <name>A name for this test</name>
  <var name="result" type="NUMBER"/>
  <statement>select case when count(*) > 0 then 1 else 0 end into result 
    from some_table;
  </statement>
</test>
<test>
  <name>Second test in the same comment block</name>
  <var name="someOtherVar" type="NUMBER"></var>
  <statement>select count(*) > 0 into someOtherVar from some_table;</statement>
</test>
***/

I had a great head start on making this happen. My team had already scripted general purpose tooling that walked our source tree with nice hooks to visit each file in a directory and run callbacks before and after processing a directory’s children. This framework was being used for a number of different tasks during the build, including validating the source files, reporting on included or excluded content, and creating master scripts that included selected code from across the source tree. You can imagine any number of uses — the copyright and timestamp cases mentioned above, scanning for unused resources, undesirable comments, or leaked debugging statements perhaps.

A Simple Code Generator

For my use case, I needed to:

  • Locate the comment blocks that included my test code.
  • Parse the test information out of those blocks.
  • Generate something useful from that information, and gather the tests into a single file.

Locating the test blocks was easy enough. I defined a custom opening and closing format for the comment blocks and wrote a regex that would find these blocks and return the enclosed content.

  COMMENT_PATTERN = /\/\*{3} @test(([^*]|[\r\n]|(\*+([^*\/]|[\r\n])))*)\*{3}\//
 
  def visit_file path
    path.read.scan(COMMENT_PATTERN) { |matchData|
      @xml_blocks << "<tests>" + matchData[0] + "</tests>"
    }
  end

My test content was written as XML, so it was trivial to retrieve the information I needed. I ran the xml blocks through a parser and grabbed the nodes I was interested in from the tree. And then it was just a matter of generating test content out of those values for the current directory’s files, then bubbling those statements up to the top of the source tree.

  def post_walk children
    local.open("w") { |io|
      @xml_blocks.each {|xml|
        doc = REXML::Document.new(xml)
        doc.root.each_element{|test|
          testName = test.get_elements("name").first.text
          io.puts "Some generated code of some sort for the test" + testName;
          test.get_elements("var").each {|var|
            io.puts "Some generated code for each node of a given type."
          }
        }
      }
    }
    (root + "_tests_all.sql").open("w") {|io| 
      subsql = dirs.map {|d| d + "_tests_all.sql"}.select(&:exist?)
      io.puts join_sql(local, *subsql) 
    }
  }

So, after a few hours, we now have a nice way to write tests with common error handling and reporting. They also share the logic to insert results into our test table.

Lessons Learned

The biggest lesson I learned, of course, was that custom tooling is not to be feared. Parsing source files for interesting information is relatively simple, and either reporting on that information or using it to generate or modify artifacts can open up a lot of interesting use cases.

I also learned that even when writing custom tooling, it pays to pick your tools carefully! My experience prior to this project mostly involved scripting with ANT and Java. While ANT certainly has its place, I found Rake and Ruby to be much faster and more enjoyable to work with. Custom tools need to pay for themselves over the course of your project, so it’s key to lower barriers when creating them.

Finally – I’m late on the bandwagon for looking at custom code generation as a way to keep things DRY. In some cases, code generation is a viable alternative to class composition to avoid duplication. In other cases, it may be your only option.

My particular use case may not be one that applies to your project, but I hope it shows how simple it can be to run tasks across your source tree and gives you ideas for tools of your own.

Jesse Hill (18 Posts)

I write web, mobile, desktop and embedded software for Atomic Object.

This entry was posted in Development Practices. Bookmark the permalink. Both comments and trackbacks are currently closed.

2 Comments

  1. Adam
    Posted October 17, 2012 at 1:18 pm

    Cool! Im interested in the tool for source tree walking. Are there any open source projects that you can recommend?

    Also, I’d suggest using JSON for defining the tests, since it would make parsing waaay easier. Also, throw those into a separate file, like Cucumber does it with features. Then you can just “File.open” to read the file, and makes future maintenance easier once you leave your company. That regex is nasty!

    • Jesse Hill
      Posted October 20, 2012 at 12:36 pm

      Hi Adam,

      If you simply need to iterate over a file hierarchy, there are plenty of options. In ruby for example, you can do something like:

      require "fileutils"
      require "pathname"
       
      ROOT = File.expand_path File.dirname(__FILE__)
      ROOTP = Pathname ROOT
      ROOTP.find do | f |
        puts f
      end

      In java, you can use the commons-io method org.apache.commons.io.FileUtils.iterateFiles or Java 7 has similar functionality built into the base libraries with the Files.walkFileTree() method.

      For more interesting functionality, it’s nice to have an object hierarchy with lifecycle methods you can override. So, something kind of like:

      require "fileutils"
      require "pathname"
       
      ROOT = File.expand_path File.dirname(__FILE__)
      ROOTP = Pathname ROOT
       
      class Recursal
       
        def initialize root, top=true
          @root = Pathname root
          @top = top
        end
       
        attr_reader :root
       
        def top?
          @top
        end
       
        def start children
          puts "Starting traversal"
        end
       
        def pre_walk children
          puts "Prior to walking directory: " + root.basename.to_s
        end
       
        def visit_file file
          puts "Processing file: " + file.basename.to_s
        end
       
        def post_walk children
          puts "After walking directory"
        end
       
        def stop children
          puts "Ending traversal"
        end
       
        def execute
          klass = self.class
          children = dirs.map{|d| klass.new d, false}
       
          start children if top?
          root.
            children.
            select(&amp;:file?).
            sort.each{|f| visit_file f}
          pre_walk children
          children.each &amp;:execute
          post_walk children
          stop children if top?
        end
       
        private
        def dirs(p=root)
          p.children.select {|p| p.directory? &amp;&amp; p.basename.to_s !~ /^\./}.sort_by(&amp;:to_s)
        end
      end
       
      Recursal.new(ROOTP).execute

      in ruby. I don’t know of libraries that are geared toward providing exactly this type of behavior but the code can be pretty simple so you may be better off just writing the bits you need.

      I agree that putting the tests in their own files would simplify the parsing. We wanted to experiment with having the tests live in comments next to the code under test – that part may or may not prove useful.