When working with large databases, SQL structure effects query performance dramatically. I recently refactored a query that looked like this:
The query went from around 45 seconds down to 3. Digging into the MySQL docs a little more I found this:
The optimizer rewrites the statement to a correlated subquery:
SELECT … FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t2.b = t1.a);
If the inner and outer queries return M and N rows, respectively, the execution time becomes on the order of O(MxN), rather than O(M+N) as it would be for an uncorrelated subquery.
An implication is that an IN subquery can be much slower than a query written using an IN(value_list)operator that lists the same values that the subquery would return.