1 Comment

Property-Based Testing for Serialized Data Structures

When I first heard about property-based testing, my instincts told me it was too academic to be of practical use. But, as is often the case in the art of software, my gut reaction failed to appreciate the value of something new.

I originally felt the same way about functional programming, so I guess I can’t trust my gut very much when it comes to new concepts. To quote Nick Hornby, “Between you and me, I have come to the conclusion that my guts have s— for brains.” I’ve recently stumbled into some great ways to get real-world value out of property-based testing.

What is Property-Based Testing?

Before I dive into how I’m using property-based testing, let’s review what this type of testing is. Scott Vokes covered the topic pretty thoroughly here. In my own words, I would say that property-based testing is about asserting important invariants in the way your code works—properties that do not change, regardless of the input. For example, the property might be “the function should never throw an exception,” or “the output string should always be valid JSON.” This contrasts with standard unit tests, where you generally assert that a specific set of inputs should produce a specific output.

Property-based testing asserts that, given arbitrary inputs, a function always behaves a certain way. It turns the computer loose to generate random inputs for your function, in search of a set of inputs which violate your assertion. If it finds one, it will “shrink” that set of inputs down to the “simplest” version of the inputs which will break the assertion.

For example, you may have a code that takes an array and fails if the array is of length greater than three. The initial random case might have an array with five elements, but the shrinker will reduce that to an array with four elements because it’s “simpler” than a five-element array.

When You Can Use Property Tests

You can test a function with property-based tests if the following conditions hold:

  1. You can generate and shrink arbitrary values for all inputs.
  2. You can make assertions about the output of your function, given arbitrary inputs. Because your inputs are arbitrary, your outputs will be somewhat arbitrary as well. So you need to have a means of asserting valid outputs.

Verifying Serialization

My current project involves sending many messages between microservices using message queues. We need to support partial or rolling deployments. Consequently, we need to update the payload of new messages while continuing to support the old versions of the payload. Say, for example, we have a data transfer object like so:


[DataContract]
public class SerializeMe
{
    [DataMember]
    public int Foo { get; set; }
    [DataMember]
    public string Bar { get; set; }
}

which serializes to JSON on the wire. Now, say we want to update SerializeMe with a new field:


    [DataContract]
    public bool Baz { get; set; }

It’s possible that when we update the code, some messages in the old format will still be in transit (or serialized on disk or in a database). Will the new version of SerializeMe successfully deserialize objects that were serialized in the old version? Property-based testing makes proving this much easier.

We can use property testing because:

  1. We can generate—and shrink—random values for the old version of SerializeMe.
  2. We can serialize the random old SerializeMe, and deserialize it as the updated SerializeMe. If this process always succeeds, we can say with confidence that the two versions are compatible.

We can also add checks to make sure values are transferred appropriately (e.g. if a field that was previously an int becomes a string). Admittedly, you can try to eyeball the two different versions of the structure, think about the properties of your serializer, and try to determine whether or not the change is safe. But why not throw some computational resources at the problem as well?

Generating and Shrinking DataContract

Using property-based testing in this way gives me much more confidence about rolling updates and message version compatibility. This is a huge win for safe iterative development in a microservice architecture–or really any message-based architecture.

I used the FsCheck library to do property testing in my project. Even though the library is written in F#, it has a really good C# API. I wrote a generator and a shrinker for classes tagged with [DataContract], using the following method:

For the generator:

  1. Use reflection to iterate over the class properties, looking for properties annotated with [DataMember].
  2. Get the random generator for each property based on the property type. This allowed me to leverage the built-in primitive and collection generators. If the property is an object type, generate it recursively using this same process.
  3. These random values form an IEnumerable<Generator<object>>. The Sequence function lets me transform this into Generator<IEnumable<object>>. Then I can use Select to turn it into a Generator of the original class by assigning those values to its properties with reflection.

For the shrinker:

  1. Iterate over the class properties one at a time, generating shrunken versions of the value for that property.
  2. Pair each shrunken value for each property with the original values for the other properties.
  3. Use those different sets of shrunken values to create shrunken objects.

This process is similar to what the FsCheck library does for F# records. It’s worth noting that I also added some logic to randomly generate null values, and to consider null a possible shrunken value for an object.

Considerations

I’m also interested in using property-based testing in conjunction with tools like Chaos Monkey. Property testing is a great fit whenever your code has meaningful invariants, but it is hard to cover all the edge cases. I can think of few situations that align better with those needs than microservices with many moving parts. By no means is property-based testing the hammer to solve all problems, but it is the perfect solution for certain types of problems. What other “one-off” techniques have you found invaluable in the right situation?