3 Comments

Statically Typed Data Validation with JSON Schema and TypeScript

One super common problem on TypeScript projects is figuring out how to validate data from external sources and tie that validated data to TypeScript types. In these situations, you generally have a few options:

  1. Define types and validations separately, and type them together with Type Guards.
  2. Use an internal DSL such as io-ts to build both a validator and the TypeScript type at the same time.
  3. Use an external DSL such as JSON Schema to define the validator, and derive the TypeScript type from there.

The first option can be error-prone and a bit redundant, as most validation of external data simply proves that it is in a shape compatible with your desired TypeScript type.

Option 2 has its own drawbacks. It’s possible to compose a type for a validator while defining it in a DSL. io-ts is a good example of this approach. There are a few downsides with this, however, in that the types get built up in a way where the type can be a humongous structure for complicated types, with no aliasing. There’s also no place in the code where you can easily go and inspect this structure. These two issues make diagnosing type issues a challenge.

So we’ve generally opted for Option 3. We use apollo-codegen to generate types from our GraphQL queries, and we’ve increasingly been using JSON schema with json-schema-to-typescript to generate TypeScript types from JSON schemas.

Let’s take a look at how this works.

JSON Schema

JSON schema is a standard for representing shapes of JSON data in a JSON document. It’s quite powerful, enabling basic type validation, modeling union types and intersection types, providing declarative mechanisms for bounding integers, testing values for inclusion in a set of valid options, and even validating strings against a regex.

In short, JSON schema actually provides a superset of the basic features of the TypeScript type system for runtime validation. As a result, they provide a natural way of expressing both a TypeScript type and a complete test for validity of that type—including additional constraints unrepresentable in a TypeScript type.

Let’s look at an example. Say you’re calling into an API to get the status of some service. That API provides data in one of three shapes, depending on whether the service is up, down, or in maintenance. The data types might look something like:

export type SystemStatus = Up | Maintenance | Down;
export type IsoTime = string;

export interface Up {
  status: "up";
  version: string;
  nextMaintenanceWindowStart?: IsoTime;
}
export interface Maintenance {
  status: "maintenance";
  maintenanceStart: IsoTime;
  expectedMaintenanceEnd?: IsoTime;
}
export interface Down {
  status: "down";
  downSince: IsoTime;
}

If this service is driving automated decisions, you don’t want rogue processes to start running amok of some basic assumption about the API changes down the road. You may want to validate the data against your assumptions about the format of that data before doing anything dangerous. JSON schema can solve this problem nicely. We might validate this data with a schema such as:

{
  "title": "SystemStatus",
  "anyOf": [
    { "$ref": "#/definitions/Up" },
    { "$ref": "#/definitions/Maintenance" },
    { "$ref": "#/definitions/Down" }
  ],

  "definitions": {
    "Up": {
      "type": "object",
      "properties": {
        "status": { "enum": ["up"] },
        "version": { "type": "string" },
        "nextMaintenanceWindowStart": { "$ref": "#/definitions/ISOTime" }
      },
      "required": ["status", "version"],
      "additionalProperties": false
    },

    "Maintenance": {
      "type": "object",
      "properties": {
        "status": { "enum": ["maintenance"] },
        "maintenanceStart": { "$ref": "#/definitions/ISOTime" },

        "expectedMaintenanceEnd": { "$ref": "#/definitions/ISOTime" }
      },
      "required": ["status", "maintenanceStart"],
      "additionalProperties": false
    },

    "Down": {
      "type": "object",
      "properties": {
        "status": { "enum": ["down"] },
        "downSince": { "$ref": "#/definitions/ISOTime" }
      },
      "required": ["status", "downSince"],
      "additionalProperties": false
    },

    "ISOTime": {
      "type": "string",
      "pattern": "/^\\d{4}-[01]\\d-[0-3]\\dT[0-2]\\d:[0-5]\\dZ$"
    }
  }
}

 

This schema will only admit data that is structurally compatible with our data types. But it does even more than that–each timestamp is also validated against a regex for ISO 8601 strings. No data which passes this schema can violate our assumptions about the data type.

Notice the structural similarity between our TypeScript types and the JSON schema. The JSON schema is a bit less compact, but there is a striking correspondence:

  • Our union type SystemStatus can be directly expressed with anyOf.
  • Defining named types to represent our different cases corresponds directly to definitions in the JSON schema.
  • TypeScript’s support for optional properties corresponds to the required keyword.

This similarity is no coincidence. Fundamentally, JSON Schema and TypeScript are aiming to solve the same problem: describing valid TypeScript data. The fact that they use concepts to represent those constraints is unsurprising.

We’ve been using an excellent NPM package called json-schema-to-typescript to avoid the repetitive re-encoding of invariants as both TypeScript types and JSON schemas. Given a JSON Schema, this package will generate the equivalent TypeScript type. In fact, the type definitions at the start of this section were simply output from json-schema-to-typescript.

Tying Schemas to Types

We’ve been using AJV to validate our data. While AJV does come with good TypeScript bindings, there’s currently no built-in mechanism to help us leverage the generated TypeScript types. (I’ve opened an issue to suggest possible improvements.)

Here are a few suggestions for accomplishing that task:

Type guards

The simplest approach is to handwrite a type guard. In our case, we might define:

// Import the type from the generated typescript file
import {SystemStatus} from "status.schema";

// Compile the JSON schema into a validation function
const statusSchema = ajv.compile(require('./status.schema.json');

function isValidStatus(candidate:any): candidate is SystemStatus {
  return statusSchema(candidate) === true;
}

 

Result types

For us, it was important to report specific validation failures to the user, so we defined a Result/Either type that could return a valid thing or a description of what was wrong:

// See link above for info about result type
function validateStatus(candidate:any): Result.Type<SystemStatus> {
  if (status(candidate) === true) {
    return candidate as SystemStatus;
  } else {
    return new Error(/* e.g. build message from statusSchema.errors */);
  }
}

 

Type-aware validators

Since we needed to validate a lot of types with a lot if schemas, we used an approach whereby we could tie a TypeScript type to a validation function once, and let our various uses of that validation function infer the result type automatically. In principle, we wanted to be able to write code like this:

const validateFoo = makeValidator<Foo>(schemaForFoo);
const validateBar = makeValidator<Foo>(schemaForBar);

if (isValid(validateFoo, aThing)) {
	// TypeScript knows `aThing` is a Foo.
	doSomethingWithFoo(aThing);
} else if (isValid(validateBar, aThing)) {
	// TypeScript knows `aThing` is a Foo.
	doSomethingWithFoo(aThing);
}

 

Unless or until AJV or your schema validator of choice provides a mechanism for this, we can pretty easily solve this problem for ourselves. Here’s how we did so with AJV.

First, we defined a subtype of the provided AJV validation function type which “knows” which type it validates:

export interface ValidateFunction<T> extends Ajv.ValidateFunction {
  _t?: T; // avoid unused parameter lint warnings
}

 

We could then define makeValidator:

export function makeValidator<T>(schema: object): ValidateFunction<T> {
  return ajv.compile(schema);
}

 

Note that ValidateFunction<T> is basically just a trivial wrapper around the built-in type, and that makeValidator uses the built-in compile but returning our new typed ValidatorFunction<T>. All this type and this function are doing is hoisting the AJV up into a type that is associated with some specific type. In our sample above, validateFoo would have type ValidateFunction<Foo>, for example.

Given this definition, we can define isValid as follows:

function isValid<T>(
  validator: ValidateFunction<T>,
  candidate: any
): candidate is T {
  return validator(candidate) === true;
}

 

This approach is quite flexible. We could also define functions that return Result.Type<T> or whatever else we want using this general approach.

With these techniques, you can simplify data validation in general and simplify the interface between TypeScript types and risky externally-sourced data. Good luck!