If you’ve worked with a piece of software that talks to a database, chances are you’ve come across a long SQL query. Slowly, your eyes glaze over as you attempt to reason through the many joins and subqueries, trying to figure out if this query is the source of the bug.
You debate between the different options. Should you hunt for the original creator of the query, or re-engineer the query yourself so you have a chance of understanding its hidden complexities?
To be fair, there’s a lot of non-SQL code out there that can cause a similar reaction. However, I’ve come to realize that embedding SQL queries in code can limit an ever-evolving codebase.
The Problem with Complex SQL Queries
SQL queries are great for asking questions about a set of data. They’re performant, familiar across projects, and overall, really powerful. Honestly, one of my favorite parts of software development is writing complex queries.
However, I’ve noticed a trend on my projects: SQL queries dangerously couple software to the persistence layer. Obviously, we need SQL queries to fetch data from our data sources, but it’s easy for developers to overuse them.
I’ve found myself embedding key business rules in queries, on top of their normal responsibility of data fetching. In large queries, business logic can be obfuscated by the many moving parts of a query. Queries need to define relationships between different pieces of data, join these tables, group the result rows, order the result set, and so on.
While a developer could write a super-efficient query to fetch data and apply business rules, this approach is unsustainable for an evolving piece of software. As understanding of the business domain grows, the data model evolves to accommodate new findings. However, any change you make to the database schema usually means re-working–or completely re-engineering–a query. Sometimes, I’ve even chosen not to change the data model in fear of re-creating a large number of SQL queries in a delicate system.
Alternatives to Large SQL Queries
To avoid the pain of large SQL queries embedded in production code, I’ve found two alternatives:
- Use lots of small queries.
- Use query builders.
The first alternative is simple. Instead of aiming to hit the database with one query that can grab everything at once, consider breaking the process into smaller bits. Write a few queries to grab sections of data, doing any processing in memory instead of aggregate functions. I’ve found this approach useful to fully understand a problem without having to jump to the most performant solution.
It’s also worth considering the tradeoffs in efficiency when using small queries. Small queries tend to have less responsibility, so they can be applied in more situations. Similar to the Single Responsibility Principle of functions, queries with a small set of responsibilities are more likely to be really good at their job, correct, and testable.
I prefer the second alternative, which involves using query builders, and I’ve written about this approach before. I’ve found great success in breaking up large, complex queries into small functions that describe each aspect of a query. This gives you a set of multiple queries with shared concepts. If you’re working in Node.js, I’d highly recommend looking in Knex.js for query building.
Final Thoughts
Complex queries have their place in software development—there are some problems that need to be solved with the highest-performing tools. However, I find it worthwhile to break up complex queries into smaller bits. Smaller queries allow for more flexibility in evolving pieces of software, which is often as desirable as performance (and sometimes more desirable). Let me know what your experience has been in the comments.
Recently I had to deal with a SQL query 1230lines long. You can imagine making a change on a, I repeat, 1230lines long query adding also the fact that is a very delicate point uses by hundreds of customers (I work for a very well know cloud software company). I don’t have time, the willingness and the authority to refactor such horror queries considering that we have an ad-hoc DBA Team which should be responsible for that.
End of the story as developers we fix issues and fingers crossed that query will keep working as expected even after my change….