I learned test-driven development from Kent Beck’s book Test-Driven Development By Example. It’s an excellent introduction that whets the appetite for one of my other favorites, Growing Object-Oriented Software, Guided by Tests by Steve Freeman and Nat Pryce.
Both of these books have a blind spot, though: they are completely silent about how modern static type systems might augment or even replace tests. After reading these books, it’s easy to think that “typing” mainly has something to do with keyboards.
Kent wrote a brief Facebook post in 2012 titled Functional TDD: A Clash of Cultures. I hoped that this would start a discussion on modern TDD influenced by types and functional programming, but it seems to have fizzled out. In this post, I’d like to continue the discussion. There are tons of people more qualified to talk about TDD, types, and functional programming, but I don’t want to wait any longer!
Two Goals for Better Test-Driven Designs
I really enjoy TDD. It’s a comfortable, intentional form of iterative prototyping that mixes well with GTD time management. TDD isn’t a magic bullet for producing good designs though — it’s easy to end up with heavily-tested poor designs. The tests can even become a large drag on improving the design.
I’ve found adding two high-level goals helps reduce the chance of getting stuck with a poor design:
# Maximize correctness; minimize the amount of code.
# Make bad code impossible to write.
The first goal is not controversial. It goes by lots of other names: YAGNI (You Ain’t Gonna Need It), KISS (Keep It Stupid Simple), Occam’s Razor, and Knuth’s famous “premature optimization is the root of all evil.” I like mine because it reminds me this is a maxi-min problem, not a simple optimization.
The second goal is less well known and might be surprising. I first heard it from Paul Snively and Amanda Laucher who learned it from Jane Street’s Yaron Minsky who blogged, “Make illegal states unrepresentable.” This is an important idea because it opposes my tendency to build loose, expressive, object-oriented systems. If it’s impossible to make a class of errors because the system doesn’t allow you to express them, then it’s not necessary to handle those errors. A good example of this is a language that doesn’t have null pointers cannot have null pointer errors.
My TDD tests are no exception to these goals. In fact, I’m even more critical of test code: it’s probably poorly conceived on the first try because I’m inventing things and exploring the solution space while at a point of maximal ignorance. Here are the two goals re-written to apply specifically to tests:
# Maximize coverage; minimize the amount of test code.
# Replace tests for errors with designs that make the errors impossible.
It sounds weird to hear someone who likes TDD try to minimize test code. Is it possible to follow TDD and end up having a single test? That seems highly improbable, but not impossible or undesirable. Such a test oracle could easily be used with git bisect
to find code that caused the test to fail, so the number of tests doesn’t limit our ability to find bad code. In TDD, I am constantly adding tests, so I never worry about the number of tests. Instead, I worry about poor coverage, poor quality, and poor design.
Using Types with Test Driven Development
Types are one of the most useful tools programmers have developed for achieving my two goals. They provide an excellent combination of consistency guarantees and concise expression. Similar to test-first development, type-first development forces you to think about requirements, interfaces and invariants. Unlike tests, which are usually specific constraints, types form general constraints that help you write future code consistent with your past design decisions.
Types lift tests up from being a “green bar” test phase to being an integral part of the code. Type-checking by a compiler can give you the same confidence as passing tests, but unlike tests, that confidence extends to all users of the code.
Some languages suffer from clumsy, verbose type syntax, but even those languages can have useful types. I’m not going to demand much from a type system in this post, but the following language features are really important:
* Static type-checking — proof your program is consistent
* Immutability — guarantee against side-effects
* Genericity — code reuse without conflating types and inheritance
To help prove the point that less powerful type systems can still be valuable, I’ve chosen to show examples in Java. It has a few advantages here: it’s the language used in both books I mentioned, it’s easy to find terrible tests written in Java, and most importantly, Java has just enough of a type system to be useful. If something is possible in Java, it will probably be easier in another language.
I find many things difficult to prove with types, so I still write a lot of tests. This isn’t a “test vs type” issue for me, but a question of how I can best prove my code works. I usually start with a test and let that drive the types, then loop back and add more tests. I’m interested to hear from people who always start type-first or who have found a way to replace acceptance-style tests with more formal proofs.
First You Need a REPL
Whenever I start work, I setup an environment where I can quickly experiment with code. The best solution is a REPL, or read-evaluate-print-loop, where I can enter code and have it immediately run. Java doesn’t normally have a REPL, but I can create a stub test and then IntelliJ lets me run the test with command-R. Here’s the skeleton REPL:
package com.atomicobject;
import org.junit.Test;
import static org.hamcrest.MatcherAssert.assertThat;
import static org.hamcrest.CoreMatchers.*;
public class Experiments {
@Test
public void REPL() {
}
}
When developing Java, I often start with static inner classes within the test class because I don’t like Java’s requirement of one class per file. It’s trivial to refactor the inner classes out when I’m confident of the code. The examples in this post all use the inner class form, but it’s not important. You can easily mix top-level classes with the REPL technique.
Low Hanging Fruit: Replace These Tests With Types
I’m going to demo building a 2D point feature. The code and tests in these examples are extremely simple, but I’ve seen all of these problems in production code. The fruit is hanging so low here that some of you may think it’s fallen on the ground. However, once people start looking for places where tests can be replaced with types, more advanced use of types comes naturally.
Replace Class Existence Tests With Types
The first thing I decide I need to do is construct a point, so I write a test:
Point p = new Point();
assertThat(p, is(not(nullValue())));
Making this test pass is easy, but once it passes, how would I make it fail? What is it testing? I can’t misspell the class name or the code won’t compile. I can’t call the constructor wrong or the code won’t compile. I can’t get a null from a Java constructor because the run-time guarantees it.
The test can’t fail, so I delete the test.
After a couple iterations, my REPL looks like this:
package com.atomicobject;
import org.junit.Test;
import static org.hamcrest.MatcherAssert.assertThat;
import static org.hamcrest.CoreMatchers.*;
public class Experiments {
static public class Point {
Integer x, y;
public Point(Integer x, Integer y) {
this.x = x;
this.y = y;
}
}
@Test
public void REPL() {
}
}
Replace Interface Existence Tests With Types
Now I need access to the X value of the point. So I write a terrible test:
Point p = new Point(0, 0);
assertThat(p, hasProperty("x"));
or maybe this equally bad test:
Point p = new Point(0, 0);
assertThat(p.getX(), is(instanceOf(Integer.class)));
After I write the methods, the tests pass, but is it useful to test for the existence of a method? If I misspell the method, my code won’t compile. If I rename or delete the method, my code won’t compile.
Types catch these errors better than any test I can write, so I delete the tests.
Replace Data Flow Tests With Immutable Types
In Java, the code for getters and setters can dominate a class. We often see lots of code to move data around and change the state of an object. Is it justifiable to skip writing tests for the majority of a class simply because the code is ugly?
static public class Point {
private Integer x, y;
public Point(Integer x, Integer y) {
this.x = x;
this.y = y;
}
public Integer getX() {
return x;
}
public void setX(Integer x) {
this.x = x;
}
public Integer getY() {
return y;
}
public void setY(Integer y) {
this.y = y;
}
}
Even if we agree to skip writing tests for getters and setters, other kinds of data flow tests are still common. I write this test to check if a getter properly returns a constructor argument:
Point p = new Point(0, 0);
assertThat(p.getX(), is(0));
The test passes, but do I need so much ugly code? Can I remove the test by simplifying a poor design?
Types work better than tests for catching problems with mutable state and data flow. I remove the indirection of the getters and setters and then make the point immutable:
public class Point {
public final Integer x, y;
public Point(Integer x, Integer y) {
this.x = x;
this.y = y;
}
}
Now I can delete the data flow test.
This does not break design encapsulation: getX()
provides no encapsulation because it’s just another way to spell x
. If you are required to release backward-compatible modules, especially binary backward-compatible modules, you may want a level of indirection, but lets stop pretending a design is future proof just because it uses getter methods.
Tests For Errors Can Indicate Poor Design
Java code frequently has null value bugs, and I write a test to verify that nulls don’t break the code:
Point p = new Point(null, 0);
assertThat(p.x, is(instanceOf(Integer.class)));
Of course the test fails. Is that constructor call correct though? Should I be allowed to pass a null value? No. So I add @Nonnull
annotations to prevent someone from using null:
public class Point {
public final @Nonnull Integer x, y;
public Point(@Nonnull Integer x, @Nonnull Integer y) {
this.x = x;
this.y = y;
}
}
And now the test is no longer valid and can be deleted. But is there a more concise, more reliable way to eliminate the null value bugs? Yes, make the null impossible to represent, not just annotated as illegal, by using primitive types:
public class Point {
public final int x, y;
public Point(int x, int y) {
this.x = x;
this.y = y;
}
}
Even though I was pursuing type safety to make user error impossible, the code landed on Java primitives which are widely considered best practice.
That code is shorter, simpler, and has less bug potential than any of the other code I’ve written. I arrived at it not because I chose good tests, but because I questioned every test and tried to use the type system as much as possible.
100% Code Coverage Is Not Enough: Cover the Types
Measuring how much of your code is tested is an excellent practice. I hear people say 100% coverage is not a reasonable goal, but it’s surprising how many bugs can be in code that has 100% coverage. Types can guide your tests to discover where bugs might lie. Let’s look at Fibonacci sequence code:
package com.atomicobject;
import org.junit.Test;
import static org.hamcrest.MatcherAssert.assertThat;
import static org.hamcrest.CoreMatchers.*;
public class FibonacciSeqTest {
static class FibonacciSeq {
int nth(int n) {
if (n == 0) return 0;
else if (n == 1) return 1;
else return nth(n-1) + nth(n-2);
}
}
@Test
public void fibonacci_numbers_are_computed() {
FibonacciSeq fib = new FibonacciSeq();
assertThat(fib.nth(0), is(0));
assertThat(fib.nth(1), is(1));
assertThat(fib.nth(2), is(1));
assertThat(fib.nth(3), is(2));
assertThat(fib.nth(4), is(3));
assertThat(fib.nth(5), is(5));
assertThat(fib.nth(20), is(6765));
}
}
This code is adequate, but the tests are horrible. The first 3 tests probably drove the design. The last 4 don’t give us any more information, but they may give someone false confidence that the system is well tested. The code is 100% covered with so many tests; it must be bug free! Not even close. The program is correct on only 0.0000001% of possible input values.
So which tests should we keep? What new tests should we add?
Java integers are 32-bit machine integers that can overflow. Fibonacci numbers grow quickly and overflow. When does overflow happen in this code? Here’s a failing test:
assertThat(fib.nth(47), is(2971215073));
If we can define a type that makes it impossible to be greater than 46, we could replace the integer with that type. Java doesn’t have a feature like that, so I add an assertion to the code and document the limit.
Java integers can be negative. What happens with negative inputs? That doesn’t even make sense for a Fibonacci sequence — what does a negative nth number mean? Here’s the failing test:
assertThat(fib.nth(-1), is(0));
Java doesn’t have non-negative integers, so I use the same solution to assert the input is non-negative and update the documentation. Here’s the final code:
package com.atomicobject;
import org.junit.Test;
import static org.hamcrest.MatcherAssert.assertThat;
import static org.hamcrest.CoreMatchers.*;
public class FibonacciSeqTest {
static class FibonacciSeq {
int nth(int n) {
assert n >= 0 && n <= 46;
if (n == 0) return 0;
else if (n == 1) return 1;
else return nth(n-1) + nth(n-2);
}
}
FibonacciSeq fib = new FibonacciSeq();
@Test
public void fibonacci_numbers_are_computed() {
assertThat(fib.nth(2), is(1));
}
@Test(expected = AssertionError.class)
public void fibonacci_sequence_starts_with_nth_0() {
fib.nth(-1);
}
@Test(expected = AssertionError.class)
public void fibonacci_numbers_wont_silently_overflow() {
fib.nth(47);
}
}
I couldn't eliminate the possibility of those errors by defining types, but looking carefully at the types still helped drive more correct code.
Ready to try it? Read part 2, where I show how to create more sophisticated types in Java and then put everything together by working through Kent Beck's money example using better typed code.