sql中not in與not exists的區別有哪些

137次閱讀

共計 7285 個字符，預計需要花費 19 分鐘才能閱讀完成。

這篇文章主要為大家展示了“sql 中 not in 與 not exists 的區別有哪些”，內容簡而易懂，條理清晰，希望能夠幫助大家解決疑惑，下面讓丸趣 TV 小編帶領大家一起研究并學習一下“sql 中 not in 與 not exists 的區別有哪些”這篇文章吧。

我先建兩個示范表，便于說明：

create table ljn_test1 (col number);

create table ljn_test2 (col number);

然后插入一些數據：

insert into ljn_test1

select level from dual connect by level =30000;

insert into ljn_test2

select level+1 from dual connect by level =30000;

commit;

然后來分別看一下使用 not exists 和 not in 的性能差異：

select * from ljn_test1 where not exists (select 1 from ljn_test2 where ljn_test1.col = ljn_test2.col);

COL

———-

1

Elapsed: 00:00:00.06

select * from ljn_test1 where col not in (select col from ljn_test2);

COL

———-

1

Elapsed: 00:00:21.28

可以看到，使用 not exists 需要 0.06 秒，而使用 not in 需要 21 秒，差了 3 個數量級！為什么呢？其實答案很簡答，以上兩個 SQL 其實并不是等價的。

我把以上兩個表的數據清除掉，重新插入數據：

truncate table ljn_test1;

truncate table ljn_test2;

insert into ljn_test1 values(1);

insert into ljn_test1 values(2);

insert into ljn_test1 values(3);

insert into ljn_test2 values(2);

insert into ljn_test2 values(null);

commit;

然后再次執行兩個 SQL：

select * from ljn_test1 where not exists (select 1 from ljn_test2 where ljn_test1.col = ljn_test2.col);

COL

———-

3

1

select * from ljn_test1 where col not in (select col from ljn_test2);

no rows selected

這回 not in 的原形暴露了，竟然得到的是空集。來仔細分解一下原因：

A. select * from ljn_test1 where col not in (select col from ljn_test2);

A 在這個例子中可以轉化為下面的 B：

B. select * from ljn_test1 where col not in (2,null);

B 可以進一步轉化為下面的 C：

C. select * from ljn_test1 where col 2 and col null;

因為 col null 是一個永假式，所以最終查出的結果肯定也就是空了。

由此可以得出結論：只要 not in 的子查詢中包含空值，那么最終的結果就為空！

not exists 語句不會出現這種情況，因為 not exists 子句中寫的是 ljn_test1 與 ljn_test2 的關聯，null 是不參與等值關聯的，所以 ljn_test2 的 col 存在空值對最終的查詢結果沒有任何影響。

我在這里暫且把 ljn_test1 叫做外表，ljn_test2 叫做內表。

只要稍做歸納，就可以得到更詳細的結論：

1、對于 not exists 查詢，內表存在空值對查詢結果沒有影響；對于 not in 查詢，內表存在空值將導致最終的查詢結果為空。

2、對于 not exists 查詢，外表存在空值，存在空值的那條記錄最終會輸出；對于 not in 查詢，外表存在空值，存在空值的那條記錄最終將被過濾，其他數據不受影響。

講到這里，我就可以開始解釋為什么上面的 not in 語句比 not exists 語句效率差這么多了。

not exists 語句很顯然就是一個簡單的兩表關聯，內表與外表中存在空值本身就不參與關聯，在 CBO(基于成本的優化器) 中常用的執行計劃是 hash join，所以它的效率完全沒有問題，看一下它的執行計劃：

set autot on;

select * from ljn_test1 where not exists (select 1 from ljn_test2 where ljn_test1.col = ljn_test2.col);

COL

———-

3

1

Elapsed: 00:00:00.01

Execution Plan

———————————————————-

Plan hash value: 385135874

——————————————————————————–

——————————————————————————–

| 0 | SELECT STATEMENT | | 3 | 78 | 7 (15)| 00:00:01 |

|* 1 | HASH JOIN ANTI | | 3 | 78 | 7 (15)| 00:00:01 |

| 2 | TABLE ACCESS FULL| LJN_TEST1 | 3 | 39 | 3 (0)| 00:00:01 |

| 3 | TABLE ACCESS FULL| LJN_TEST2 | 2 | 26 | 3 (0)| 00:00:01 |

——————————————————————————–

Predicate Information (identified by operation id):

—————————————————

1 – access(LJN_TEST1 . COL = LJN_TEST2 . COL)

這個執行計劃很清晰，沒有什么需要解釋的，再看一下 not in:

select * from ljn_test1 where col not in (select col from ljn_test2);

no rows selected

Elapsed: 00:00:00.01

Execution Plan

———————————————————-

Plan hash value: 3267714838

——————————————————————————–

——————————————————————————–

| 0 | SELECT STATEMENT | | 1 | 13 | 5 (0)| 00:00:01 |

|* 1 | FILTER | | | | | |

| 2 | TABLE ACCESS FULL| LJN_TEST1 | 3 | 39 | 3 (0)| 00:00:01 |

|* 3 | TABLE ACCESS FULL| LJN_TEST2 | 2 | 26 | 2 (0)| 00:00:01 |

——————————————————————————–

Predicate Information (identified by operation id):

—————————————————

1 – filter(NOT EXISTS (SELECT 0 FROM LJN_TEST2 LJN_TEST2

WHERE LNNVL(COL :B1)))

3 – filter(LNNVL( COL :B1))

可以看到關聯謂詞是 filter，它類似于兩表關聯中的 nested loop，也就是跑兩層循環，可見它的效率有多差。為什么 not in 不能使用 hash join 作為執行計劃呢？正如上面解釋的，因為內表或外表中存在空值對最終結果產生的影響是 hash join 無法實現的，因為 hash join 不支持把空值放到 hash 桶中，所以它沒辦法處理外表和內表中存在的空值，效率與正確性放在一起時，肯定是要選擇正確性，所以 oracle 必須放棄效率，保證正確性，采用 filter 謂詞。

這個執行計劃中我們還有感興趣的東西，那就是：LNNVL(COL :B1)，關于 LNNVL 的解釋可以參見官方文檔：http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions078.htm

它在這里的作用很巧妙，oracle 知道使用 filter 性能很差，所以它在掃描內表 ljn_test2 時，會使用 LNNVL 來檢查 ljn_test2.col 是否存在 null 值，只要掃描到 null 值，就可以斷定最終的結果為空值，也就沒有了繼續執行的意義，所以 oracle 可以馬上終止執行，在某種意義上它彌補了 filter 較差的性能。

我用例子來證明這一點，首先先造一些數據：

truncate table ljn_test1;

truncate table ljn_test2;

insert into ljn_test1

select level from dual connect by level =30000;

insert into ljn_test2

select level+1 from dual connect by level =30000;

commit;

然后我為了讓 oracle 盡快掃描到 ljn_test2.col 為 null 的那條記錄，我要先找到物理地址最小的那條記錄，因為通常情況全表掃描會先掃描物理地址最小的那條記錄：

select col from ljn_test2 where rowid=(select min(rowid) from ljn_test2);

COL

———-

1982

然后我把這條記錄更新為空：

update ljn_test2 set col = null where col=1982;

commit;

然后再來看一下 not in 的查詢效率：

select * from ljn_test1 where col not in (select col from ljn_test2);

no rows selected

Elapsed: 00:00:00.17

看到這個結果后我很爽，它和之前查詢需要用時 21 秒有很大的差別！

當然，我們不能總是指望 oracle 掃描表時總是最先找到 null 值，看下面的例子：

update ljn_test2 set col = 1982 where col is null;

select col from ljn_test2 where rowid=(select max(rowid) from ljn_test2);

COL

———-

30001

update ljn_test2 set col = null where col=30001;

commit;

再看一下 not in 的查詢效率：

select * from ljn_test1 where col not in (select col from ljn_test2);

COL

———-

1

Elapsed: 00:00:21.11

這一下 not in 再一次原形畢露了！

機會主義不行，更杯具的是如果內表中沒有空值，那 LNNVL 優化就永遠起不到作用，相反它還會增大開銷！

其實只要找到原因，問題很好解決，不就是空值在作怪嘛！在正常的邏輯下用戶本來就是想得到和 not exists 等價的查詢結果，所以只要讓 oracle 知道我們不需要空值參與進來就可以了。

第一種解決方案：

將內表與外表的關聯字段設定為非空的：

alter table ljn_test1 modify col not null;

alter table ljn_test2 modify col not null;

好了，再看一下執行計劃：

set autot on;

select * from ljn_test1 where col not in (select col from ljn_test2);

COL

———-

1

Elapsed: 00:00:00.07

Execution Plan

———————————————————-

Plan hash value: 385135874

——————————————————————————–

——————————————————————————–

| 0 | SELECT STATEMENT | | 1 | 26 | 28 (8)| 00:00:01 |

|* 1 | HASH JOIN ANTI | | 1 | 26 | 28 (8)| 00:00:01 |

| 2 | TABLE ACCESS FULL| LJN_TEST1 | 30000 | 380K| 13 (0)| 00:00:01 |

| 3 | TABLE ACCESS FULL| LJN_TEST2 | 30000 | 380K| 13 (0)| 00:00:01 |

——————————————————————————–

Predicate Information (identified by operation id):

—————————————————

1 – access(COL = COL)

很好！這回 oracle 已經知道使用 hash join 了！不過有時候表中需要存儲空值，這時候就不能在表結構上指定非空了，那也同樣簡單：

第二種解決方案：

查詢時在內表與外表中過濾空值。

先把表結構恢復為允許空值的：

alter table ljn_test1 modify col null;

alter table ljn_test2 modify col null;

然后改造查詢：

select * from ljn_test1 where col is not null and col not in (select col from ljn_test2 where col is not null);

COL

———-

1

Elapsed: 00:00:00.07

Execution Plan

———————————————————-

Plan hash value: 385135874

——————————————————————————–

——————————————————————————–

| 0 | SELECT STATEMENT | | 1 | 26 | 28 (8)| 00:00:01 |

|* 1 | HASH JOIN ANTI | | 1 | 26 | 28 (8)| 00:00:01 |

|* 2 | TABLE ACCESS FULL| LJN_TEST1 | 30000 | 380K| 13 (0)| 00:00:01 |

|* 3 | TABLE ACCESS FULL| LJN_TEST2 | 30000 | 380K| 13 (0)| 00:00:01 |

——————————————————————————–

Predicate Information (identified by operation id):

—————————————————

1 – access(COL = COL)

2 – filter(COL IS NOT NULL)

3 – filter(COL IS NOT NULL)

OK! hash join 出來了！我想我關于 not exists 與 not in 之間的比較也該結束了。

以上是“sql 中 not in 與 not exists 的區別有哪些”這篇文章的所有內容，感謝各位的閱讀！相信大家都有了一定的了解，希望分享的內容對大家有所幫助，如果還想學習更多知識，歡迎關注丸趣 TV 行業資訊頻道！

正文完

發表至：數據庫

2023-07-20

轉載說明：除特殊說明外本站除技術相關以外文章皆由網絡搜集發布，轉載請注明出處。

怎么在MySQL數據庫中進行時間查詢數據

navicat出現中文亂碼的解決方法

MySQL常見錯誤代碼有哪些

有ip地址怎么用phpmyadmin訪問遠程mysql數據庫

怎么用Docker方式部署TiDB

久久精品人人爽,华人av在线,亚洲性视频网站,欧美专区一二三

sql中not in與not exists的區別有哪些