Statically Typed Data Validation with JSON Schema and TypeScript

One super common problem on TypeScript projects is figuring out how to validate data from external sources and tie that validated data to TypeScript types. In these situations, you generally have a few options:

  1. Define types and validations separately, and type them together with Type Guards.
  2. Use an internal DSL such as io-ts to build both a validator and the TypeScript type at the same time.
  3. Use an external DSL such as JSON Schema to define the validator, and derive the TypeScript type from there.

The first option can be error-prone and a bit redundant, as most validation of external data simply proves that it is in a shape compatible with your desired TypeScript type.

Option 2 has its own drawbacks. It’s possible to compose a type for a validator while defining it in a DSL. io-ts is a good example of this approach. There are a few downsides with this, however, in that the types get built up in a way where the type can be a humongous structure for complicated types, with no aliasing. There’s also no place in the code where you can easily go and inspect this structure. These two issues make diagnosing type issues a challenge.

So we’ve generally opted for Option 3. We use apollo-codegen to generate types from our GraphQL queries, and we’ve increasingly been using JSON schema with json-schema-to-typescript to generate TypeScript types from JSON schemas.

Let’s take a look at how this works.

JSON Schema

JSON schema is a standard for representing shapes of JSON data in a JSON document. It’s quite powerful, enabling basic type validation, modeling union types and intersection types, providing declarative mechanisms for bounding integers, testing values for inclusion in a set of valid options, and even validating strings against a regex.

In short, JSON schema actually provides a superset of the basic features of the TypeScript type system for runtime validation. As a result, they provide a natural way of expressing both a TypeScript type and a complete test for validity of that type—including additional constraints unrepresentable in a TypeScript type.

Let’s look at an example. Say you’re calling into an API to get the status of some service. That API provides data in one of three shapes, depending on whether the service is up, down, or in maintenance. The data types might look something like:

export type SystemStatus = Up | Maintenance | Down;
export type IsoTime = string;

export interface Up {
  status: "up";
  version: string;
  nextMaintenanceWindowStart?: IsoTime;
}
export interface Maintenance {
  status: "maintenance";
  maintenanceStart: IsoTime;
  expectedMaintenanceEnd?: IsoTime;
}
export interface Down {
  status: "down";
  downSince: IsoTime;
}

If this service is driving automated decisions, you don’t want rogue processes to start running amok of some basic assumption about the API changes down the road. You may want to validate the data against your assumptions about the format of that data before doing anything dangerous. JSON schema can solve this problem nicely. We might validate this data with a schema such as:

{
  "title": "SystemStatus",
  "anyOf": [
    { "$ref": "#/definitions/Up" },
    { "$ref": "#/definitions/Maintenance" },
    { "$ref": "#/definitions/Down" }
  ],

  "definitions": {
    "Up": {
      "type": "object",
      "properties": {
        "status": { "enum": ["up"] },
        "version": { "type": "string" },
        "nextMaintenanceWindowStart": { "$ref": "#/definitions/ISOTime" }
      },
      "required": ["status", "version"],
      "additionalProperties": false
    },

    "Maintenance": {
      "type": "object",
      "properties": {
        "status": { "enum": ["maintenance"] },
        "maintenanceStart": { "$ref": "#/definitions/ISOTime" },

        "expectedMaintenanceEnd": { "$ref": "#/definitions/ISOTime" }
      },
      "required": ["status", "maintenanceStart"],
      "additionalProperties": false
    },

    "Down": {
      "type": "object",
      "properties": {
        "status": { "enum": ["down"] },
        "downSince": { "$ref": "#/definitions/ISOTime" }
      },
      "required": ["status", "downSince"],
      "additionalProperties": false
    },

    "ISOTime": {
      "type": "string",
      "pattern": "/^\\d{4}-[01]\\d-[0-3]\\dT[0-2]\\d:[0-5]\\dZ$"
    }
  }
}

 

This schema will only admit data that is structurally compatible with our data types. But it does even more than that–each timestamp is also validated against a regex for ISO 8601 strings. No data which passes this schema can violate our assumptions about the data type.

Notice the structural similarity between our TypeScript types and the JSON schema. The JSON schema is a bit less compact, but there is a striking correspondence:

  • Our union type SystemStatus can be directly expressed with anyOf.
  • Defining named types to represent our different cases corresponds directly to definitions in the JSON schema.
  • TypeScript’s support for optional properties corresponds to the required keyword.

This similarity is no coincidence. Fundamentally, JSON Schema and TypeScript are aiming to solve the same problem: describing valid TypeScript data. The fact that they use concepts to represent those constraints is unsurprising.

We’ve been using an excellent NPM package called json-schema-to-typescript to avoid the repetitive re-encoding of invariants as both TypeScript types and JSON schemas. Given a JSON Schema, this package will generate the equivalent TypeScript type. In fact, the type definitions at the start of this section were simply output from json-schema-to-typescript.

Tying Schemas to Types

We’ve been using AJV to validate our data. While AJV does come with good TypeScript bindings, there’s currently no built-in mechanism to help us leverage the generated TypeScript types. (I’ve opened an issue to suggest possible improvements.)

Here are a few suggestions for accomplishing that task:

Type guards

The simplest approach is to handwrite a type guard. In our case, we might define:

// Import the type from the generated typescript file
import {SystemStatus} from "status.schema";

// Compile the JSON schema into a validation function
const statusSchema = ajv.compile(require('./status.schema.json');

function isValidStatus(candidate:any): candidate is SystemStatus {
  return statusSchema(candidate) === true;
}

 

Result types

For us, it was important to report specific validation failures to the user, so we defined a Result/Either type that could return a valid thing or a description of what was wrong:

// See link above for info about result type
function validateStatus(candidate:any): Result.Type<SystemStatus> {
  if (status(candidate) === true) {
    return candidate as SystemStatus;
  } else {
    return new Error(/* e.g. build message from statusSchema.errors */);
  }
}

 

Type-aware validators

Since we needed to validate a lot of types with a lot if schemas, we used an approach whereby we could tie a TypeScript type to a validation function once, and let our various uses of that validation function infer the result type automatically. In principle, we wanted to be able to write code like this:

const validateFoo = makeValidator<Foo>(schemaForFoo);
const validateBar = makeValidator<Foo>(schemaForBar);

if (isValid(validateFoo, aThing)) {
	// TypeScript knows `aThing` is a Foo.
	doSomethingWithFoo(aThing);
} else if (isValid(validateBar, aThing)) {
	// TypeScript knows `aThing` is a Foo.
	doSomethingWithFoo(aThing);
}

 

Unless or until AJV or your schema validator of choice provides a mechanism for this, we can pretty easily solve this problem for ourselves. Here’s how we did so with AJV.

First, we defined a subtype of the provided AJV validation function type which “knows” which type it validates:

export interface ValidateFunction<T> extends Ajv.ValidateFunction {
  _t?: T; // avoid unused parameter lint warnings
}

 

We could then define makeValidator:

export function makeValidator<T>(schema: object): ValidateFunction<T> {
  return ajv.compile(schema);
}

 

Note that ValidateFunction<T> is basically just a trivial wrapper around the built-in type, and that makeValidator uses the built-in compile but returning our new typed ValidatorFunction<T>. All this type and this function are doing is hoisting the AJV up into a type that is associated with some specific type. In our sample above, validateFoo would have type ValidateFunction<Foo>, for example.

Given this definition, we can define isValid as follows:

function isValid<T>(
  validator: ValidateFunction<T>,
  candidate: any
): candidate is T {
  return validator(candidate) === true;
}

 

This approach is quite flexible. We could also define functions that return Result.Type<T> or whatever else we want using this general approach.

With these techniques, you can simplify data validation in general and simplify the interface between TypeScript types and risky externally-sourced data. Good luck!

 
Conversation
  • Fzr says:

    Is that means this article:
    https://spin.atomicobject.com/2017/04/24/typescript-modular-typesafe-metadata/
    is depreciated?
    BTW, based on “Modular, Type-safe Metadata with TypeScript” I was unable to create recursively lookup into nested classes/interfaces and bind all together because of limited Reflection access

    • Drew Colthorp Drew Colthorp says:

      Hi Fzr,

      These two techniques are orthogonal. This post is purely concerned with using JSON schema from TypeScript with good type support, whereas the other post really more about building APIs to build out metadata for your types.

      You can combine these techniques if you want. For example, you could use the metadata technique to decorate types generated from JSON schema, with TypeScript validating that the two don’t fall out of sync (to make up an arbitrary combination – I haven’t done this).

      That said, one area where the metadata post differs from my current usual style is that we’re using lots of little types at different layers of the system instead of one big type used everywhere. So to a certain extent, some of the patterns in the examples from that post would be less likely to occur in code-bases where we’re using e.g. this JSON schema technique.

      Drew

  • John Hammond says:

    You can try out a framework called ts.validator. Generic object validation. Fluent rules.

    https://github.com/VeritasSoftware/ts.validator

    Here is an example of how your TypeScript models can be validated using the framework:

    /*TypeScript model*/
    class Person {
    Name: string;
    }

    /* Validation rules */
    var validatePersonRules = (validator: IValidator) : ValidationResult => {
    return validator
    .NotEmpty(m => m.Name, “Name cannot be empty”)
    .Exec();
    };

    /* Populate model */
    var person = new Person();
    person.Name = “John Hammond”;

    /* Validate model */
    /* Sync */
    var validationResult = new Validator(person).Validate(validatePersonRules);
    /* Async */
    var validationResult = await new Validator(person).ValidateAsync(validatePersonRules);

  • Carlos Galarza says:

    Great article! Drew.

  • Charlie O says:

    Great article! I’m new to DSL and am learning more about how to extend my productivity by automatically generating things from it. This is just what I needed since I use JSON schema and am now learning about TS.

  • Josejulio Martínez says:

    > There are a few downsides with this, however, in that the types get built up in a way where the type can be a humongous structure for complicated types, with no aliasing. There’s also no place in the code where you can easily go and inspect this structure. These two issues make diagnosing type issues a challenge.

    Can you explain a bit more of the challenges that you faced with this approach? I’m trying to decide and I have seen that with current versions of io-ts you build your structure as is if you were writing typescript (sort of).
    Was it different before? Or maybe i’m missing something.

    Thanks!

    • Drew Colthorp Drew Colthorp says:

      The main issue is that types that are built programmatically have historically led to e.g. enormous tooltips in VSCode, since you can end up with an enormous nested stucture when you build complex types using eg io-ts. Since io-ts generates types programatically from API use, that large complex tooltip was not only unwieldy, but it was also the only way to see a representation of the actual typescript type anywhere. As I recall, the type errors were difficult to read for the same reason.

      If you’re using it for validating little structures, no big deal. But it didn’t scale well for us to complex tree structures, due to the usability/introspection issues. The types were technically correct and the validation worked, it just could be difficult to debug and introspect. Generating the types gives you a file on disk you can inspect, and was generally easier to deal with.

      However, TypeScript has made some great strides in presenting more usable tooltips and error messages for complex types since this post was written, so the problem may be somewhat alleviated. You still won’t have a place you can go see the actual type, but usable tooltips and error messages may make that less necessary. Also, depending on your use-case it might be a non-issue to begin with. The library was definitely well-done, it just was hard to wrangle in very complex cases with a lot of union types, nesting, etc with Typescript as it was in early 2018.

      In any case, I continue to be happy with JSON schema – it’s a cross-platform standard with many high quality implementations in different languages. I’d certainly be willing to give io-ts another shot, if I didn’t already have streamlined tooling for JSON schema.

  • Comments are closed.