RSchema-Hamster: Schemas for Shaping Immutable Data

RSchema-Hamster is a Ruby gem that lets you use Hamster’s persistent data structures to define and validate the shape of your data via RSchema.

What for?

In order to clarify my designs, reduce mistakes, and leave code easier to change than when I found it, I:

  • Strive to program with immutable data
  • Learn to apply ideas from type-driven development

Using RSchema-Hamster, I can define a schema such as:

require 'rschema_hamster'

Name = String
PageNumber = Integer

Appearances = Hamster.vector(
  Hamster.hash(character: Name, page: PageNumber)

Build a data structure like this:

guide = Hamster.vector(
  Hamster.hash(character: "Arthur", page: 1), 
  Hamster.hash(character: "Zaphod", page: 98)

And assert the data conforms to the schema like this:

RSchema.validate!(Appearances, guide)

Immutable Data Structures via Hamster

I prefer to code using primarily immutable values because I’ve come to understand just how careful I need to be when mutating program state: plan for it, isolate it, and generally avoid changing internal object state unnecessarily.

Two up-front challenges have been to change my design habits to favor stateless, functional code (difficult, took lots of practice) and to find a powerful persistent data structures tool (not difficult; for Ruby, I picked Hamster).

The Hamster Ruby gem provides a set of persistent collections to supplement or replace Ruby’s mutable collections, such as Hash, Vector, lazy Lists, and so forth. It looks like this:

ford = Hamster.hash(name: "Ford Prefect", gender: :male)
# => Hamster::Hash[:name => "Ford Prefect", :gender => :male]

arthur = ford.put(:name, "Arthur Dent")
# => Hamster::Hash[:name => "Arthur Dent", :gender => :male]
# (ford remains unchanged)

Getting Lost

As I start moving back to using generic collections like Hashes, Vectors, and Sets to build my data structures, I’m getting a little nervous about inconsistency and accidental misuse. Immutable or not, these collections don’t care what you store in them and will give you no hint as to their intended use. There’s a lot of room for mistakes; this is why I had a hard time letting go of my descriptively-named classes and their attr_accessor declarations.

(A couple years back, Patrick Bacon and I created Hamsterdam as a first step toward alleviating this concern, using immutable record-like objects as a means of defining and documenting attributes.)

Schema-based Definition and Validation via RSchema

Whereas some functional languages like Haskell and F# have very expressive type systems for describing data, dynamic languages like Clojure and Ruby require me to take matters into my own hands. I discovered Tom Dalling’s RSchema (a Ruby implementation of Prismatic’s Schema for Clojure), and started tinkering with ad-hoc schema structures to help clarify and test some of my existing code as I refactored it. I now felt like I could start taking some queues from the Type-Driven crowd.

In RSchema, a schema is itself data, a value that describes the shape of other values:

OrderRow = {
  order_id: Integer,
  order_totals: Totals

row = { 
  order_id: 5, 
  order_totals: a_valid_totals_structure 

RSchema.validate! OrderRow, row

Read: An OrderRow is a Hash whose order_id key refers to an integer value, and whose order_totals key refers to a Totals object, presumably a Ruby class or perhaps another schema object.

It may not be compile-time type-checking, but it sure is expressive, and it gives me the means to programmatically connect variables to their intended types. I can easily discover and understand the data’s shape by reading the code, and at runtime (especially during automated unit and integration testing) assert that the data conforms. Executable documentation, y’see.

So if I want to insist that OrderRow is an immutable Hash, I want to be able to write:

OrderRow = Hamster.hash(
  order_id: Integer,
  order_totals: Totals

However, the above schema isn’t valid. In order for an object to be treated as a descriptive instances (in this case a Hash containing sub-schemas), the object must implement a method called “schema_walk”, and Hamster structures don’t.

Extending Hamster to Support RSchema

RSchema defines schemas in terms of classes, Hashes, and Arrays, plus a few configurable schemas available via its DSL, such as “enum,” “maybe,” and an optional Hash-key decorator. Ruby’s Hash and Array classes implement their respective “schema_walk” such that their instances may be used to describe internal structure.

In order to use Hamster instances to describe structure, I provided implementations of “schema_walk” to Hamster::Hash, Hamster::Vector and so on, plus a few generic schemas to give parity to RSchema’s “hash_of” and “set_of” DSL methods (“hamster_hash_of” and “hamster_set_of” respectively).

The big bonus: schemas are designed to be nestable in RSchema. Because of this, schemas I define using Hamster’s types may still contain (or be contained by) Ruby’s Hash and Array structures, as well as utilize (or be used by) RSchema’s “enum,” “maybe,” and “_?” (optional hash key) schema helpers, without any extra implementation.

Here’s an abridged real-world example of how I was able to untangle a document-like structure that was already on its way to generating headaches in a Rails app I’m working on:

require 'rschema_hamster'

module OrderReport::Schema
  Name    = String
  Id      = Integer
  Dollars = BigDecimal

  Totals = Hamster.hash(
    gross: Dollars,
    tax:   Dollars,
    fee:   Dollars,
    net:   Dollars,

  OrderRow = Hamster.hash(
    order_id:     Id,
    order_number: Name,
    order_totals: Totals,

  MarketBlock = Hamster.hash(
    market_id:        Id,
    market_name:      Name,
    account_dropdown: RSchemaHamster.schema {
      hamster_hash_of(Name => Id)
    order_rows:       Hamster.vector(OrderRow),
    market_totals:    Totals

def self.dollars(str);; end

market_block = Hamster.from(
    market_id: 42,
    market_name: "The Restaurant at the End of the Universe",
    account_dropdown: {
      "Hotblack Desiato" => 1, 
      "Zaphod Beeblebrox" => 3
    order_rows: [
      { order_id: 101, order_number: "MILLIWAYS-00101", order_totals: { gross: dollars("120"), tax: dollars("14.4"), fee: dollars("20"), net: dollars("85.6") } },
      { order_id: 102, order_number: "MILLIWAYS-00102", order_totals: { gross: dollars("3030"), tax: dollars("363.6"), fee: dollars("505.10"), net: dollars("2161.3") } },
    market_totals: { gross: dollars("3150"), tax: dollars("378"), fee: dollars("525.10"), net: dollars("2246.9") }

RSchema.validate!(OrderReport::HamsterSchema::MarketBlock, market_block)