Understanding Macros and Code as Data

The other day, while having a conversation in the office about Clojure macros, I was reminded that not everyone fully understands the phrase “code as data” or its useful repercussions. This isn’t surprising, as there are still very few mainstream languages that feature this, and most of the ones that do are lisp dialects. I think that’s unfortunate, as it’s a conceptually simple tool that you can get a lot of leverage out of.

Metaprogramming & Homoiconicity

Metaprogramming is at the heart of this idea. Many languages support metaprogramming in various degrees; some common examples include generic type systems or runtime reflection. These features are nice because they allow you to design some level of run-time or compile-time flexibility into your code. There are limits, however, because you lack the freedom to affect the behavior of or rewrite all aspects of what your code is doing.

A special thing happens when you introduce the feature of homoiconicity to a language. Essentially, homoiconicity is when your language’s structure (and possibly syntax) is defined in its own data types. For more information on the details, you might be interested in reading Understanding Homoiconicity in Clojure. When your code is defined in terms of the data structures inherent to your language, it becomes exceedingly easy to rewrite, transform, or even generate code. And, as you are essentially manipulating source code itself, this means that there are no language features that are unavailable to you.

Benefits of Code as Data

The two main areas that I see this benefit in real-world practice are in reducing repetition and enhancing the expressiveness of the language.

Reducing repetition

Every once in a while, you run into a task that is not inherently hard but is difficult to easily express without significant repetition. Take this example, which is simply taking things out of a list, processing them, and building a more-friendly map of data.

(defn parse-system-state-vec [state-vec]
  (let [[standby-status
         fan-speed
         temperature
         hardware-version
         software-version] state-vec]
    {:standby-status (parse-zero-indexed-enum standby-status [:awake :standby-countdown :shutdown-countdown :standby])
     :fan-speed (parse-zero-indexed-enum fan-speed [:off :low :high])
     :temperature (parse-num temperature)
     :hardware-version (parse-num hardware-version)
     :software-version (parse-num software-version)}))

Most of the work here is simply bookkeeping. Instead, we could write a macro that would take a description of the work to be done, and then generate this ugly code underneath it. Let’s assume we did this, and the macro name is defparser. Defparser would accept a set of keys for the resulting map, paired with expressions showing how to compute the associated value. While we’re at it, we’ll even make it so that the raw value gets inserted as the first argument in the expression for us, so we don’t have to type it.

(defparser parse-system-state
  :standby-status (parse-zero-indexed-enum [:awake :standby-countdown :shutdown-countdown :standby])
  :fan-speed (parse-zero-indexed-enum [:off :low :high])
  :temperature (parse-num)
  :hardware-version (parse-num)
  :software-version (parse-num))

Underneath, defparser could be generating the same code. It would behave identically at runtime, but the latter example has less repetition and is less prone to breakage or simple errors. Also note that there are many ways that you could define the shape of data you accept while still retaining the ability to generate code as in the first example, which gives you a lot of freedom in how you want to think about your ideas.

Enhancing expressiveness

After you become comfortable with leveraging the ability to rewrite your code, you start to think about doing it in more ambitious ways than reducing simple repetition in your code. They key thing to remember here is that your macros are actually rewriting your source code, and since your source is represented by your language’s own data structures, you can return anything that looks like valid source code and use any language features you want.

So if you can express your ideas in a structured way, regardless of how high-level they are, you have the power to rewrite those ideas into valid code. Let’s look at this example using sqlkorma:

(select users
  (with account)
  (fields :name :email :account.status)
  (where {:id 42}))

Korma has implemented a DSL for describing SQL queries right within Clojure. Internally, when Clojure compiles this code, it passes it into Korma, which generates a simple data structure describing the constraints provided (a join table, which fields to select, etc.), and outputs code that, at runtime, will call a function to execute the query by providing it that generated data.

This is commonly referred to as building up the language to meet your problem domain.

Conclusion

When you get the chance to use a language that lets you treat code as data, you should take it. It’s actually not as scary of a concept as it seems, and once you wrap your mind around it, you’re left with the ability to create some truly powerful abstractions that you can leverage to write more succinct and expressive code.