14 Comments

MySQL "IN" – Query Performance

When working with large databases, SQL structure effects query performance dramatically. I recently refactored a query that looked like this:


Refactored into:


The query went from around 45 seconds down to 3. Digging into the MySQL docs a little more I found this:

The optimizer rewrites the statement to a correlated subquery:
SELECTFROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t2.b = t1.a);

If the inner and outer queries return M and N rows, respectively, the execution time becomes on the order of O(MxN), rather than O(M+N) as it would be for an uncorrelated subquery.

An implication is that an IN subquery can be much slower than a query written using an IN(value_list)operator that lists the same values that the subquery would return.

MySql doc on subquery restrictions