2 Methods for Implementing Multi-Year Data

Recently, my software team and I were faced with a need to add historical data to our database. Additionally, we also needed to be efficient with our time when it came to implementation. As a result our team came up with two options for implementing multi-year data – one that could get the job done and another that could be built off in the future.

Method 1 – Add a Year Column

To add some colour to the situation, we specifically needed to seed historical data by year. This means exact timestamps weren’t necessary to store. As a result of this requirement, we decided to first store the year in a column on each table.

There are a few advantages here – first, it’s extremely quick to implement. For small to medium datasets, which fit our use case, this was the best option at the time. If we needed to create views or add an index on this specific column, too, it could remain efficient for some time.

An obvious disadvantage that would show up with time, however, is that this is really redundant and not normalized. Once the dataset grows to be large, a change will be needed.

Method 2 – Add a Year Table

The second solution we came up with was to later down the line add an independent table to store different year values in each row. Then, for tables required by the app, have this column included as a foreign key reference to this year table.

The key advantage here is that it is just a lot cleaner. This structure follows standard data normalization and data integrity rules that more developers are familiar with. It also allows us to later down the line enrich each row in the year table with additional columns/metadata.

It would also allow us to, if needed, move on to timestamps in the future to store more specific frames of time / historical data. While it is nice to store just an integer for the year column, it doesn’t save much in terms of speed because an integer is very simple. To truly take advantage of this approach, either having a large dataset or additional metadata requirements would be nice.

Originally when investigating approaches to solve our problem I was expecting a much more complex implementation strategy. I was excited to see that the two approaches found were much simpler than originally planned. I hope what I’ve found is helpful to someone down the road. And, if you have any other strategies, I encourage you to share them.

Conversation

Join the conversation

Your email address will not be published. Required fields are marked *