(mysql.info) subquery-restrictions
Info Catalog
(mysql.info) cursor-restrictions
(mysql.info) restrictions
(mysql.info) view-restrictions
I.3 Restrictions on Subqueries
==============================
* Known bug to be fixed later: If you compare a `NULL' value to a
subquery using `ALL', `ANY', or `SOME', and the subquery returns
an empty result, the comparison might evaluate to the non-standard
result of `NULL' rather than to `TRUE' or `FALSE'.
* A subquery's outer statement can be any one of: `SELECT', `INSERT',
`UPDATE', `DELETE', `SET', or `DO'.
* Subquery optimization for `IN' is not as effective as for the `='
operator or for `IN(VALUE_LIST)' constructs.
A typical case for poor `IN' subquery performance is when the
subquery returns a small number of rows but the outer query
returns a large number of rows to be compared to the subquery
result.
The problem is that, for a statement that uses an `IN' subquery,
the optimizer rewrites it as a correlated subquery. Consider the
following statement that uses an uncorrelated subquery:
SELECT ... FROM t1 WHERE t1.a IN (SELECT b FROM t2);
The optimizer rewrites the statement to a correlated subquery:
SELECT ... FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t2.b = t1.a);
If the inner and outer queries return M and N rows, respectively,
the execution time becomes on the order of `O(M×N)', rather than
`O(M+N)' as it would be for an uncorrelated subquery.
An implication is that an `IN' subquery can be much slower than a
query written using an `IN(VALUE_LIST)' construct that lists the
same values that the subquery would return.
* In general, you cannot modify a table and select from the same
table in a subquery. For example, this limitation applies to
statements of the following forms:
DELETE FROM t WHERE ... (SELECT ... FROM t ...);
UPDATE t ... WHERE col = (SELECT ... FROM t ...);
{INSERT|REPLACE} INTO t (SELECT ... FROM t ...);
Exception: The preceding prohibition does not apply if you are
using a subquery for the modified table in the `FROM' clause.
Example:
UPDATE t ... WHERE col = (SELECT (SELECT ... FROM t...) AS _t ...);
Here the prohibition does not apply because a subquery in the
`FROM' clause is materialized as a temporary table, so the
relevant rows in `t' have already been selected by the time the
update to `t' takes place.
* Row comparison operations are only partially supported:
* For `EXPR IN (SUBQUERY)', EXPR can be an N-tuple (specified
via row constructor syntax) and the subquery can return rows
of N-tuples.
* For `EXPR OP {ALL|ANY|SOME} (SUBQUERY)', EXPR must be a
scalar value and the subquery must be a column subquery; it
cannot return multiple-column rows.
In other words, for a subquery that returns rows of N-tuples, this
is supported:
(VAL_1, ..., VAL_N) IN (SUBQUERY)
But this is not supported:
(VAL_1, ..., VAL_N) OP {ALL|ANY|SOME} (SUBQUERY)
The reason for supporting row comparisons for `IN' but not for the
others is that `IN' is implemented by rewriting it as a sequence
of `=' comparisons and `AND' operations. This approach cannot be
used for `ALL', `ANY', or `SOME'.
* Row constructors are not well optimized. The following two
expressions are equivalent, but only the second can be optimized:
(col1, col2, ...) = (val1, val2, ...)
col1 = val1 AND col2 = val2 AND ...
* Subqueries in the `FROM' clause cannot be correlated subqueries.
They are materialized (executed to produce a result set) before
evaluating the outer query, so they cannot be evaluated per row of
the outer query.
* The optimizer is more mature for joins than for subqueries, so in
many cases a statement that uses a subquery can be executed more
efficiently if you rewrite it as a join.
An exception occurs for the case where an `IN' subquery can be
rewritten as a `SELECT DISTINCT' join. Example:
SELECT col FROM t1 WHERE id_col IN (SELECT id_col2 FROM t2 WHERE CONDITION);
That statement can be rewritten as follows:
SELECT DISTINCT col FROM t1, t2 WHERE t1.id_col = t2.id_col AND CONDITION;
But in this case, the join requires an extra `DISTINCT' operation
and is not more efficient than the subquery.
* Possible future optimization: MySQL does not rewrite the join
order for subquery evaluation. In some cases, a subquery could be
executed more efficiently if MySQL rewrote it as a join. This
would give the optimizer a chance to choose between more execution
plans. For example, it could decide whether to read one table or
the other first.
Example:
SELECT a FROM outer_table AS ot
WHERE a IN (SELECT a FROM inner_table AS it WHERE ot.b = it.b);
For that query, MySQL always scans `outer_table' first and then
executes the subquery on `inner_table' for each row. If
`outer_table' has a lot of rows and `inner_table' has few rows,
the query probably will not be as fast as it could be.
The preceding query could be rewritten like this:
SELECT a FROM outer_table AS ot, inner_table AS it
WHERE ot.a = it.a AND ot.b = it.b;
In this case, we can scan the small table (`inner_table') and look
up rows in `outer_table', which will be fast if there is an index
on `(ot.a,ot.b)'.
* Possible future optimization: A correlated subquery is evaluated
for each row of the outer query. A better approach is that if the
outer row values do not change from the previous row, do not
evaluate the subquery again. Instead, use its previous result.
* Possible future optimization: A subquery in the `FROM' clause is
evaluated by materializing the result into a temporary table, and
this table does not use indexes. This does not allow the use of
indexes in comparison with other tables in the query, although
that might be useful.
* Possible future optimization: If a subquery in the `FROM' clause
resembles a view to which the merge algorithm can be applied,
rewrite the query and apply the merge algorithm so that indexes
can be used. The following statement contains such a subquery:
SELECT * FROM (SELECT * FROM t1 WHERE t1.t1_col) AS _t1, t2 WHERE t2.t2_col;
The statement can be rewritten as a join like this:
SELECT * FROM t1, t2 WHERE t1.t1_col AND t2.t2_col;
This type of rewriting would provide two benefits:
* It avoids the use of a temporary table for which no indexes
can be used. In the rewritten query, the optimizer can use
indexes on `t1'.
* It gives the optimizer more freedom to choose between
different execution plans. For example, rewriting the query
as a join allows the optimizer to use `t1' or `t2' first.
* Possible future optimization: For `IN', `= ANY', `<> ANY', `=
ALL', and `<> ALL' with non-correlated subqueries, use an
in-memory hash for a result result or a temporary table with an
index for larger results. Example:
SELECT a FROM big_table AS bt
WHERE non_key_field IN (SELECT non_key_field FROM TABLE WHERE CONDITION)
In this case, we could create a temporary table:
CREATE TABLE t (key (non_key_field))
(SELECT non_key_field FROM TABLE WHERE CONDITION)
Then, for each row in `big_table', do a key lookup in `t' based on
`bt.non_key_field'.
Info Catalog
(mysql.info) cursor-restrictions
(mysql.info) restrictions
(mysql.info) view-restrictions
automatically generated byinfo2html