We're hiring!

We're actively seeking developers and designers for our Detroit & Ann Arbor locations.

Understanding Homoiconicity in Clojure

For my recent LambdaJam workshop on learning Clojure macros from first principles, I created a set of materials exploring the basic concepts. To really understand macros, you first need to have a good understanding of what makes them so powerful — homoiconicity. In this post, we’ll explore that property of the language.

Code Is Data; Data Is Code

Clojure is a lisp, one of a family of languages known for their prefix syntax and copious parentheses. Those parentheses are there for good reason. Lisps are homoiconic, meaning code written in the language is encoded as data structures that the language has tools to manipulate.

Consider the following expression:

(let [x 1] 
    (inc x))

This code, entered directly into the REPL, returns 2 because the repl compiles and executes any code entered into it. But [x 1] is also a literal vector data structure when it appears in a different context.

All Clojure code can be interpreted as data in this way. In fact, Clojure is a superset of EDN – Extensible Data Notation, a data transfer format similar to JSON. EDN supports numbers, strings, lists ((1 2 3)), vectors ([1 2 3]), maps ({"key" "value"}), and more. If this sounds and looks a lot like Clojure syntax, it’s because it is. The relationship between Clojure and EDN is similar to that of Javascript and JSON, but much more powerful.

In Clojure, unlike JavaScript, all code is written in this data format. We can look at our let statement not as Clojure code, but an EDN data structure. Let’s take a closer look:

(let [x 1] 
    (inc x))

In this data structure, there are four different types of data.

  • 1 is a literal integer.
  • let, x, and inc are symbols. A symbol is an object representing a name – think a string, but as an atomic object and not a sequence of characters.
  • [x 1] is a vector containing two elements: symbol, x, and an integer, 1. Square brackets always signify vectors when talking about EDN data structures.
  • (inc x) is a list (a linked list data structure) containing two symbols, inc and x.

When thinking about a piece of Clojure code as a data structure, we say we are talking about the form. Clojure programmers don’t normally talk about EDN, there are just two ways to think about any bit of Clojure: 1) as code that will execute or 2) as a form, a data structure composed of numbers, symbols, keywords, strings, vectors, lists, maps, etc.

Symbols are particularly important. They are first class names. In Clojure, we distinguish between a variable and the name of that variable. When our code is executing, x refers to the variable established by our let binding. But when we deal with that code as a form, x is just a piece of data, it’s a name, which in Clojure is called a symbol.

This is why Clojure is homoiconic. Code forms are data structures and data structures can be thought of as forms and executed as code. This transformation is quite literal, and two core operations, quote and eval are key ingredients to this potion.

Quote (Treating Code as Data)

quote is a special form, a magic keyword built in to the compiler, which suspends computation. quote means “give me the form of this thing, not its value”. Function calls are instead treated as list constructors, and any names are returned as symbols instead of resolved to run-time values.

quote leaves some things alone and changes other things. Things that don’t do stuff in Clojure code are left alone (strings, keywords, etc.). But the stuff that does, like symbols (which refer to variables) and parentheses (which invoke functions), are left as data.

quote normally isn’t written out in Clojure. An apostrophe is used instead. An apostrophe is short-hand for quote. Thus, the following are all true:

(= (quote +) '+)
(= (quote (+ 1 2)) '(+ 1 2))
(= ''+
   '(quote +)
   (quote '+)
   (quote (quote +)))

Eval (Executing Data as Code)

eval is the opposite of quote. quote is a special form that stops the execution of code and instead treats it like a data structure. eval takes a data structure and executes it as code.

> (quote (+ 1 1))
(+ 1 1)

With eval, we can undo our quotation:

> (eval
    (quote 
      (+ 1 1)))
2

Using eval to immediately undo quotation is not very useful. Where it starts to get interesting is in the space between the eval and the quote. It’s here that we can redefine the rules of the language. By treating our quoted code as data and manipulating it, we can translate it into a different data structure. Something that was previously meaningless to the Clojure compiler can become executable code. We can introduce new language constructs or play with scoping or even change the foundations of the language.

Macros expose this power to the user by effectively executing in this space. They run at compile time, when the code still exists as data structures, and return new code that replaces their invocation. Because Clojure is homoiconic, they just take regular data structures and produce them, it happens that the data structures they manipulate represent Clojure code and are eventually evaluated. By letting users bring all the power of Clojure to bear on compile-time code transformations, macros in homoiconic languages let you grow the language into whatever it needs to be to best solve your problems.

 

Drew Colthorp (28 Posts)

This entry was posted in Functional Programming and tagged , . Bookmark the permalink. Both comments and trackbacks are currently closed.