In this series, I’ll be answering the question: “What’s a clean way to represent data that changes over time?”
Let’s start with an example. Say you’re in the business of selling pens. Right now, you sell your pens for $1.00 each. Next week, you want to increase the price to $1.50. And a month later, you want to up the price to $1.75. How would you store this data?
Storing Variable Attributes in a Version Table
First, you want to figure out which attributes of the pen might change over time. Perhaps the selling price and ink color of the pen might change, but the description of the pen will not. The stable characteristics of the pen (unique identifier, description) could live in a base table, and the variable aspects of the pen (selling price, ink type) could live in a versioned table.
The versioned table would have an effective date range column specifying which dates the versioned attributes are effective for. Each version record would also be associated with a record on the base table (i.e., the version record would have a foreign key to the base table).
It would look like this:
The price of “Very Old Pen” is $1.00 until end of 2020, $1.50 for the month of January, and $1.75 from February 1, 2021, onward. “Very Old Pen” is always blue.
This setup lets us retrieve the version information and the base information for a specific pen identifier on a specific date. If I ask, “What is the info for the pen with identifier 1 on January 15, 2021?” the color would be blue, the name would be “Very Old Pen,” and the price would be $1.50. But if I ask, “What is the info for the pen with identifier 1 on February 15, 2021?” the name would still be “Very Old Pen,” and the color would still be blue, but the price would be $1.75.
The “Fancy Scribbler” is always $5.00. It’s blue before January 1, 2021, and black afterward. So the answer to, “What is the color and price of the pen with identifier 2 on January 15, 2020?” is blue and $5.00. But the answer to, “What is the color and price of the pen with identifier 2 on January 15, 2021?” is black and $5.00.
Looking Up Versioned Records: Some Invariants to Maintain
Storing variable attributes in a versioned table with a date range makes it easy to find which version is applicable for a given date. Whenever you select a pen record, you want both the stable and variable attributes associated with that pen. To get all those attributes, you combine the attributes of the record on the base table with the attributes of the applicable record from the versioned table.
In order to easily look up all the versioned attributes from the versioned table for a given date, we maintain a few invariants.
No Gaps in Versioned Table Date Range
To ensure that we will be able to look up all pen attributes for any given date, the versioned attributes records should form a continuous range. Choose a starting date before which you will not have any records. After that start date, there should be no gaps in the versioned records’ effective date range.
In this example, I didn’t care about pricing data before the start of the epoch (1970). I chose the epoch as my effective date range start date, and I made sure there were no gaps in versioned records between the epoch and infinity.
This means that I couldn’t insert:
- A versioned record with a range from January 1, 1970, to January 1, 1990.
- A versioned record with a range from January 1, 1995, to infinity.
There would be a gap between January 1, 1990, and January 1, 1995, meaning I wouldn’t be able to find an effective versioned record for any dates in that time period.
The Last Versioned Record Is Valid Until “Infinity”
In order to support lookup on a record for any date in the future (be it tomorrow or 2050), maintain the invariant that the last record in any versioned table would be valid until infinity.
For instance, I wouldn’t want to insert a version record effective from January 1, 1970, to January 1, 2000, then just insert a version record effective from January 1, 2000, to January 1, 2001.
I would not have pricing or color information for any dates after January 1, 2001.
To ensure that I can query out the variable attributes for any given date, I would need to either update my last record to be effective until infinity (represented by NULL) in our system or create another record effective from January 2, 2001, until infinity.
Ranges for Versioned Records Do Not Overlap
Versioned records can never have an overlapping range. Say I had a version record effective from January 1, 1970, to January 1, 2000, and a versioned record effective from January 1, 1985, to infinity.
If I were to try to look up a pen for January 1, 1990, I would not know if it should use the versioned record effective until 2000 or the versioned record effective starting 1999.
Maintaining the above invariants ensures that there will always be one versioned record for any given date greater than the start date (in the above example, the epoch). As long as these invariants are maintained, we will always be able to look up a full record for any valid effective date by combining the base attributes with the versioned attributes effective on that date.
In upcoming posts, I’ll go over more specifics on how to enforce these invariants and then use these invariants when combining versioned attributes with the base attributes and inserting new versioned records.
This post is part of a series on versioned records:
- Storing Variable Attributes in a Version Table
- Looking up a Record with Variable Attributes
- Preventing Overlapping Ranges with Database-Level Exclusion Constraints