5 Comments

Handling JSON from the Command Line with Jq

pretty_json

In the battle of data formats, the two heavyweights are XML and JSON. Of late, JSON seems to be winning, in large part because most languages natively support JSON’s chosen data structures, but there’s one arena where JSON hasn’t made much of a showing: the command line. The command line provides a lot of programs to handle plain text (grep, cut, awk, and sed, to name just a few), and handling XML is even supported through xpath, but dealing with JSON has been the domain of standalone scripts written in Python, Ruby, or whatever your favorite language happens to be. There’s been few good ways to manage JSON directly from the shell. That’s why Stephen Dolan created Jq.

Dolan bills Jq as sed or awk for JSON. It defines a concise notation for manipulating JSON data structures, allowing you to avoid separate, multiline scripts in most cases and carry out operations directly from the command line.

The simplest use of Jq is to reformat JSON. Often web APIs compress their JSON, removing all whitespace and making it hard for humans to read. If you have such a block of JSON, pipe it to the command jq '.' to break it apart and indent it nicely. You may have used a similar facility built into Python: python -m json.tool. One notable difference between the two is that Jq provides syntax highlighting, coloring object keys blue and strings green.

Jq pretty printing and syntax highlighting

Pretty printing is only the tip of the iceberg. Python can pretty print with ease, but any other JSON processing takes you off the command line and into a script. For instance, several months ago I worked on a project that had dozens of branches and corresponding pull requests. I often wanted to open the pull request for the branch I was on, so I wrote the following script to filter JSON from the GitHub API and print the URL of the pull request made from a particular branch:


import json
import sys

for p in json.load(sys.stdin):
  if p['head']['ref'] == sys.argv[1]:
    print p['html_url']

It might not look like a lot of code, but I’m traversing the JSON structure manually, iterating through the list until I find an entry with fields matching the passed in value, then printing a field from that entry. With Jq, the same process is much more succinct:

jq -r ".[] | select(.head.ref == \"$1\") | .html_url"

(The -r option removes quotes from the string, printing only the raw output.)

The Python version doesn’t seem prohibitively complex, and doesn’t require learning a new syntax, but it has some drawbacks:

  1. Boilerplate. The imports and json.load(sys.stdin) are necessary in every script, and they don’t offer any insight into why the script is needed.
  2. Multiline. The fact that the Python script must span multiple lines makes it difficult to write ad hoc queries on the command line. A language like Ruby that doesn’t mind whitespace might be able to compact the command into a single line, but readability will suffer.

Jq is great for quick queries of JSON data on the command line, easily earning its claim of being JSON’s version of sed or awk, programs I use almost exclusively on the command line. If you work with JSON, see if Jq makes handling it easier. If it does, I’d love to see how you use it.