1 Comment

(Clojure) Spec-ing Data from JSON

I’ve returned to Clojure after some time using other tools, so I’m late to the party regarding learning and experimenting with Spec. The first order of business was to build some specs for data I’m receiving from an external system in JSON format. Given that it was my first real experience with Spec, and I had no control over the shape of the data I was spec-ing, I was uncertain how to proceed.

Background

Prismatic Schema, which I’m more experienced with using, takes the approach of using collections to model collections. In Schema’s approach, if you want to create a schema for a map of data, it might look like this:

{:name schema/String :id schema/Int}

That is, the map above serves as a schema for other maps that have the keys :name, whose value should be a string, and :id, whose value should be an integer.

Spec takes a different approach, and collections are validated using regular predicates, which are essentially just functions. There are some interesting tradeoffs in both approaches.

Interlude: The Registry

Spec has a concept that Schema does not: a registry of specs. Schema is designed around the use of data structures to describe other data structures, and usually you’ll build up a schema of arbitrary complexity (whatever feels right at the time) and just store it in a Var in a namespace for later reference.

This is not so for Spec. Spec maintains a registry: a global (across your entire codebase and its dependencies) map of names to predicates. Furthermore, Spec enforces the use of the registry when it comes to its collection predicates. The above example, in Spec, looks like:

(s/def :my.ns/name string?)
(s/def :my.ns/id int?)
(s/def :my.ns/thing (s/keys :req [:my.ns/name :my.ns/id]))

It’s important to note that, although namespaced keywords are required, Spec’s version of def doesn’t create any vars; the spec is simply stored in the registry under the provided name. Additionally, in this example, Spec would expect that the keys to the map would exactly match the name of the referenced specs; i.e., a valid map would look like {:my.ns/name “something” :my.ns/id 42}. The keys predicate also supports the :req-un option, in which case the namespace would be removed from the keys, and a valid map would appear as: {:name “something” :id 42}.

Although it’s a bit more verbose and unwieldy, Spec’s approach does have an upside in that you are forced to decomplect your specs from the start. I believe this was part of the motivation for creating Spec. The tradeoff is that your codebase (via the contents of Spec’s registry) is influenced by the shape of the data you are dealing with.

The Problem

For my project, I started off by creating a namespace containing specs for the JSON I was receiving from the external system, along with some simple fns to interpret information out of that data. Right away, I was dealing with JSON keys such as “x” and “y.” It began to feel as though I were polluting my namespace (even if only within the Spec registry), because I was attempting to define all these specs in a single namespace: my-app.server.messages.

Eventually, I reached a point where I could not continue to model the data this way due to reuse of a key name in a nested structure. In a sub-map, the key had a different meaning (and required a different predicate to validate) than in the parent map. Because of Spec’s requirement that all key names be derived from the name they are given in the registry, I had two options: create multiple namespaces in my source, or begin using namespaced keywords that don’t have any correlation to my actual source code.

Both options seemed problematic in their own way. It made sense, to me, to keep this all in one source file. There would be a lot of repetitiveness in my source if I used a long namespace key without the assistance of the ::ns-alias/key syntax. Additionally, since the data was coming from JSON, I had to make use of :req-un, leaving me with some fear of missing out on a potential lesson to learn from Spec’s encouragement of namespaced keys in my data.

I began to suspect that this just was not a good use of Spec. It was also a bit disheartening because I could have hammered out a Schema for the data in about five minutes.

A Solution

Since this is just a hobby project, I set it aside for a few days to let my mind process it in the background. When I came back, I decided to give it another try, but using a very short and nonsensical “namespace.” Given that this is not a library to be reused on others, so I’m not worried about collisions with other code, I settled on using “json” for the name. What I came up with looked something like:

(ns my-app.server.messages
  (:require [clojure.spec.alpha :as s]))

(s/def :json.parent/id int?)
(s/def :json.parent/items (s/* :json/child))
(s/def :json/parent (s/keys :req-un [:json.parent/id :json.parent/items]))

(s/def :json.child/id string?)
(s/def :json.child/name string?)
(s/def :json/child (s/keys :req-un [:json.child/id :json.child/name]))

The farther I went along using this approach, the more comfortable and confident I became in it. However, I still have some open questions; e.g., should I have named the parent spec :my-app.server.messages/parent instead of :json/parent? That might make more sense where these specs will be referenced elsewhere. It does seem borderline unacceptable to require a namespace for its specs, but to register those specs with an entirely different namespace.

For now, it works well enough.

Wrapping Up

This is an interesting case where, in my opinion, Prismatic Schema would have been more elegant. Despite this, I’m carrying on for the sake of learning and experimentation. Even if this case is a bit messy, the mess should be easily contained. I’d be wary of internalizing the shape of the JSON data in my own application state, for example–regardless of whether I was validating it in the first place.

I’m still looking forward to the experience of spec-ing my application’s domain.

Any thoughts or questions? I’d love to hear them in the comments.