Our team recently found itself in a situation where we needed to write a database query to:
- Union several tables together
- Filter out some rows
- Sort the unioned set different ways
This was going to be a very expensive and slow query. To solve this problem, we ended up using a materialized view (we are using a PostgreSQL database). This is working really well for us.
In this post, I’ll review what materialized views are, when you should consider using them, and what we learned while bringing them into our project.
What Is a Materialized View?
Materialized views are disc-stored views that can be refreshed. Like views, they are defined by a database query. Unlike views, their underlying query is not executed every time you access them.
Creating a materialized view is pretty simple:
CREATE MATERIALIZED VIEW my_view AS your query here
Executing the above query will create a materialized view populated with the data from the query. You can then write other queries against
my_view just as you would with a traditional view or database table.
You can’t insert data into a materialized view as you can with a table. To update the contents of a materialized view, you can execute a query to refresh it. This will re-execute the query used to create it.
REFRESH MATERIALIZED VIEW my_view
Executing this refresh query will lock the materialized view so it can’t be accessed while refreshing. This can be a problem if your application can’t tolerate downtime while the refresh is happening.
PostgreSQL 9.4 allows you to refresh your view in a way that enables queries during the refresh:
REFRESH MATERIALIZED VIEW CONCURRENTLY my_view
To better optimize your materialized view queries, you can add indexes to the materialized view columns just as you would with a database table.
CREATE INDEX my_index_1 ON my_view (some_column); CREATE INDEX my_index_2 ON my_view (some_other_column); ...
When Should You Use Them?
Traditional database views can be really helpful. They can help hide complexity and provide database users with a nicer API. They can also be used to secure your database. So, when should you use a traditional view vs. a materialized view?
Using a traditional view
I’d recommend using this type of view when:
- The view query is fast, or you can tolerate the slowness. If the query is fast, it doesn’t make sense to accept the overhead of handling refreshes that comes with materialized views. However, slow queries don’t automatically mean you should reach for the materialized view hammer. For example, if you are accessing the view infrequently in the background to generate reports, you probably don’t care much about the query being fast.
- You can’t tolerate stale data. Every time you query a view, it reproduces itself. Therefore, the view data is never old. You have to manage the data refreshing for materialized views. It essentially comes down to a data freshness vs. performance tradeoff.
Using a materialized view
I’d opt for a materialized view instead when:
- The view query is slow, and you can’t tolerate the slowness. The performance characteristics for accessing materialized views are very fast, especially if you add the appropriate indexes. In our scenario, we were using a materialized view to source data for a paginated API. The queries against it had to be very fast. Otherwise, users would experience unacceptable slowness in their browser.
- You have some tolerance for stale data. This one is kind of tricky. Developers are often faced with trading some data freshness for performance (e.g., when doing any type of caching). This is the exact tradeoff made when using materialized views. Maybe you can’t tolerate severe data staleness, but perhaps a few minutes or hours is acceptable in between materialized view refreshes.
Thoughts and Tips
Below are some additional thoughts on materialized views and how they might fit into your system architecture.
Refreshing materialized views
As we discussed, materialized views are updated via a refresh query. Depending on your situation, you could run the refresh query on a schedule, or you could set up database triggers to run the refresh. One common approach is to define triggers that listen for inserts/updates on the tables that feed the materialized view and let those events execute the refresh.
Testing with materialized views
We have a large catalog of unit tests in our codebase. Writing tests around materialized views wasn’t too difficult. Since you can’t directly populate a materialized view, we had to populate the tables that provide their data and then execute a refresh query. This actually worked out nicely, as it tested the refresh functionality as well as the code that accessed the materialized view.
In some sense, a materialized view is just a tool that you can use to cache data. There are often several ways to implement caching. Should you handle the caching at the database level or in the application layer (e.g., materialized view or Redis cache)?
Viewing these caching approaches as equivalent substitutes can be a little dangerous. Addressing this in your database schema is great. It hides a lot of complexity.
However, you can run into scaling issues if you solve all of your caching problems in your database. You will likely hit your database connection limit way earlier than you would exhaust resources for something like Redis. It’s also easier to horizontal-scale in memory caching–not so much for your database.
What’s your experience with materialized views? Please share your thoughts in the comments.