共計 6025 個字符,預計需要花費 16 分鐘才能閱讀完成。
本篇內容介紹了“PostgreSQL 查詢優化中對消除外連接的處理過程是什么”的有關知識,在實際案例的操作過程中,不少人都會遇到這樣的困境,接下來就讓丸趣 TV 小編帶領大家學習一下如何處理這些情況吧!希望大家仔細閱讀,能夠學有所成!
使用的測試腳本:
drop table if exists t_null1;
create table t_null1(c1 int);
insert into t_null1 values(1);
insert into t_null1 values(2);
insert into t_null1 values(null);
drop table if exists t_null2;
create table t_null2(c1 int);
insert into t_null2 values(1);
insert into t_null2 values(null);
一、基本概念
消除外連接的代碼注釋說明如下:
/*
* reduce_outer_joins
* Attempt to reduce outer joins to plain inner joins.
*
* The idea here is that given a query like
* SELECT ... FROM a LEFT JOIN b ON (...) WHERE b.y = 42;
* we can reduce the LEFT JOIN to a plain JOIN if the = operator in WHERE
* is strict. The strict operator will always return NULL, causing the outer
* WHERE to fail, on any row where the LEFT JOIN filled in NULLs for b s
* columns. Therefore, there s no need for the join to produce null-extended
* rows in the first place --- which makes it a plain join not an outer join.
* (This scenario may not be very likely in a query written out by hand, but
* it s reasonably likely when pushing quals down into complex views.)
*
* More generally, an outer join can be reduced in strength if there is a
* strict qual above it in the qual tree that constrains a Var from the
* nullable side of the join to be non-null. (For FULL joins this applies
* to each side separately.)
*
* Another transformation we apply here is to recognize cases like
* SELECT ... FROM a LEFT JOIN b ON (a.x = b.y) WHERE b.y IS NULL;
* If the join clause is strict for b.y, then only null-extended rows could
* pass the upper WHERE, and we can conclude that what the query is really
* specifying is an anti-semijoin. We change the join type from JOIN_LEFT
* to JOIN_ANTI. The IS NULL clause then becomes redundant, and must be
* removed to prevent bogus selectivity calculations, but we leave it to
* distribute_qual_to_rels to get rid of such clauses.
*
* Also, we get rid of JOIN_RIGHT cases by flipping them around to become
* JOIN_LEFT. This saves some code here and in some later planner routines,
* but the main reason to do it is to not need to invent a JOIN_REVERSE_ANTI
* join type.
*
* To ease recognition of strict qual clauses, we require this routine to be
* run after expression preprocessing (i.e., qual canonicalization and JOIN
* alias-var expansion).
*/
有兩種類型的外連接可以被消除, 第一種是形如以下形式的語句:
SELECT … FROM a LEFT JOIN b ON (…) WHERE b.y = 42;
這種語句如滿足條件可變換為內連接 (INNER_JOIN).
之所以可以變換為內連接, 那是因為這樣的語句與內連接處理的結果是一樣的, 原因是在 Nullable-Side 端 (需要填充 NULL 值的一端), 存在過濾條件保證這一端不可能是 NULL 值, 比如 IS NOT NULL/y = 42 這類強(strict) 過濾條件.
testdb=# explain verbose select * from t_null1 a left join t_null2 b on a.c1 = b.c1;
QUERY PLAN
--------------------------------------------------------------------------------
Merge Left Join (cost=359.57..860.00 rows=32512 width=8) -- 外連接
Output: a.c1, b.c1
Merge Cond: (a.c1 = b.c1)
- Sort (cost=179.78..186.16 rows=2550 width=4)
Output: a.c1
Sort Key: a.c1
- Seq Scan on public.t_null1 a (cost=0.00..35.50 rows=2550 width=4)
Output: a.c1
- Sort (cost=179.78..186.16 rows=2550 width=4)
Output: b.c1
Sort Key: b.c1
- Seq Scan on public.t_null2 b (cost=0.00..35.50 rows=2550 width=4)
Output: b.c1
(13 rows)
testdb=# explain verbose select * from t_null1 a left join t_null2 b on a.c1 = b.c1 where b.c1 = 1;
QUERY PLAN
------------------------------------------------------------------------------
Nested Loop (cost=0.00..85.89 rows=169 width=8) -- 外連接 (Left 關鍵字) 已被消除
Output: a.c1, b.c1
- Seq Scan on public.t_null1 a (cost=0.00..41.88 rows=13 width=4)
Output: a.c1
Filter: (a.c1 = 1)
- Materialize (cost=0.00..41.94 rows=13 width=4)
Output: b.c1
- Seq Scan on public.t_null2 b (cost=0.00..41.88 rows=13 width=4)
Output: b.c1
Filter: (b.c1 = 1)
(10 rows)
第二種形如:
SELECT … FROM a LEFT JOIN b ON (a.x = b.y) WHERE b.y IS NULL;
這種語句如滿足條件可以變換為反半連接 (ANTI-SEMIJOIN).
過濾條件已明確要求 Nullable-Side 端 y IS NULL, 如果連接條件是 a.x = b.y 這類嚴格 (strict) 的條件, 那么這樣的外連接與反半連接的結果是一樣的.
testdb=# explain verbose select * from t_null1 a left join t_null2 b on a.c1 = b.c1 where b.c1 is null;
QUERY PLAN
--------------------------------------------------------------------------------
Hash Anti Join (cost=67.38..152.44 rows=1275 width=8) -- 變換為反連接
Output: a.c1, b.c1
Hash Cond: (a.c1 = b.c1)
- Seq Scan on public.t_null1 a (cost=0.00..35.50 rows=2550 width=4)
Output: a.c1
- Hash (cost=35.50..35.50 rows=2550 width=4)
Output: b.c1
- Seq Scan on public.t_null2 b (cost=0.00..35.50 rows=2550 width=4)
Output: b.c1
(9 rows)
值得一提的是, 在 PG 中, 形如 SELECT … FROM a LEFT JOIN b ON (…) WHERE b.y = 42; 這樣的 SQL 語句,WHERE b.y = 42 這類條件可以視為連接的上層過濾條件, 在查詢樹中,Jointree- fromlist(元素類型為 JoinExpr)與 Jointree- quals 處于同一層次, 由于 JoinExpr 中的 quals 為同層條件, 因此其上層即為 Jointree- quals. 有興趣的可以查看日志輸出查看 Query 查詢樹結構.
二、源碼解讀
消除外連接的代碼在主函數 subquery_planner 中, 通過調用 reduce_outer_joins 函數實現, 代碼片段如下:
/*
* If we have any outer joins, try to reduce them to plain inner joins.
* This step is most easily done after we ve done expression
* preprocessing.
*/
if (hasOuterJoins)
reduce_outer_joins(root);
reduce_outer_joins
相關的數據結構和依賴的子函數:
reduce_outer_joins_state
typedef struct reduce_outer_joins_state
{
Relids relids; /* base relids within this subtree */
bool contains_outer; /* does subtree contain outer join(s)? */
List *sub_states; /* List of states for subtree components */
} reduce_outer_joins_state;
BitmapXX
typedef struct Bitmapset
{
int nwords; /* number of words in array */
bitmapword words[FLEXIBLE_ARRAY_MEMBER]; /* really [nwords] */
} Bitmapset;
#define WORDNUM(x) ((x) / BITS_PER_BITMAPWORD)
#define BITNUM(x) ((x) % BITS_PER_BITMAPWORD)
/* The unit size can be adjusted by changing these three declarations: */
#define BITS_PER_BITMAPWORD 32
typedef uint32 bitmapword; /* must be an unsigned type */
/*
* bms_make_singleton - build a bitmapset containing a single member
*/
Bitmapset *
bms_make_singleton(int x)
{
Bitmapset *result;
int wordnum,
bitnum;
if (x 0)
elog(ERROR, negative bitmapset member not allowed
wordnum = WORDNUM(x);
bitnum = BITNUM(x);
result = (Bitmapset *) palloc0(BITMAPSET_SIZE(wordnum + 1));
result- nwords = wordnum + 1;
result- words[wordnum] = ((bitmapword) 1 bitnum);
return result;
}