怎么使用PostgreSQL中的Bloom索引

146次閱讀

共計 4445 個字符，預(yù)計需要花費 12 分鐘才能閱讀完成。

這篇文章主要講解了“怎么使用 PostgreSQL 中的 Bloom 索引”，文中的講解內(nèi)容簡單清晰，易于學習與理解，下面請大家跟著丸趣 TV 小編的思路慢慢深入，一起來研究和學習“怎么使用 PostgreSQL 中的 Bloom 索引”吧！

簡介
Bloom Index 源于 Bloom filter(布隆過濾器), 布隆過濾器用于在使用少量的空間的情況下可以很快速的判定某個值是否在集合中, 其缺點是存在假陽性 False Positives, 因此需要 Recheck 來判斷該值是否在集合中, 但布隆過濾器不存在假陰性, 也就是說, 對于某個值如果過濾器返回不存在, 那就是不存在.

結(jié)構(gòu)
其結(jié)構(gòu)如下圖所示:

第一個 page 為 metadata, 然后每一行都會有一個 bit array(signature)和 TID 與其對應(yīng).

示例
創(chuàng)建數(shù)據(jù)表, 插入數(shù)據(jù)

testdb=# drop table if exists t_bloom;
DROP TABLE
testdb=# CREATE TABLE t_bloom (id int, dept int, id2 int, id3 int, id4 int, id5 int,id6 int,id7 int,details text, zipcode int);
CREATE TABLE
testdb=# 
testdb=# INSERT INTO t_bloom 
testdb-# SELECT (random() * 1000000)::int, (random() * 1000000)::int,
testdb-# (random() * 1000000)::int,(random() * 1000000)::int,(random() * 1000000)::int,(random() * 1000000)::int, 
testdb-# (random() * 1000000)::int,(random() * 1000000)::int,md5(g::text), floor(random()* (20000-9999 + 1) + 9999) 
testdb-# from generate_series(1,16*1024*1024) g;
INSERT 0 16777216
testdb=# 
testdb=# analyze t_bloom;
ANALYZE
testdb=# 
testdb=# select pg_size_pretty(pg_table_size( t_bloom 
 pg_size_pretty 
----------------
 1619 MB
(1 row)

創(chuàng)建 Btree 索引

testdb=# 
testdb=# create index idx_t_bloom_btree on t_bloom using btree(id,dept,id2,id3,id4,id5,id6,id7,zipcode);
CREATE INDEX
testdb=# \di+ idx_t_bloom_btree
 List of relations
 Schema | Name | Type | Owner | Table | Size | Description 
--------+-------------------+-------+-------+---------+--------+-------------
 public | idx_t_bloom_btree | index | pg12 | t_bloom | 940 MB | 
(1 row)

執(zhí)行查詢

testdb=# EXPLAIN ANALYZE select * from t_bloom where id4 = 305294 and zipcode = 13266;
 QUERY PLAN 
---------------------------------------------------------------------------------------------------------
 Index Scan using idx_t_bloom_btree on t_bloom (cost=0.56..648832.73 rows=1 width=69) (actual time=2648.215..2648.215 rows=0
 loops=1)
 Index Cond: ((id4 = 305294) AND (zipcode = 13266))
 Planning Time: 3.244 ms
 Execution Time: 2659.804 ms
(4 rows)
testdb=# EXPLAIN ANALYZE select * from t_bloom where id5 = 241326 and id6 = 354198;
 QUERY PLAN 
---------------------------------------------------------------------------------------------------------
 Index Scan using idx_t_bloom_btree on t_bloom (cost=0.56..648832.73 rows=1 width=69) (actual time=2365.533..2365.533 rows=0
 loops=1)
 Index Cond: ((id5 = 241326) AND (id6 = 354198))
 Planning Time: 1.918 ms
 Execution Time: 2365.629 ms
(4 rows)

創(chuàng)建 Bloom 索引

testdb=# create extension bloom;
CREATE EXTENSION
testdb=# CREATE INDEX idx_t_bloom_bloom ON t_bloom USING bloom(id, dept, id2, id3, id4, id5, id6, id7, zipcode) 
testdb-# WITH (length=64, col1=4, col2=4, col3=4, col4=4, col5=4, col6=4, col7=4, col8=4, col9=4);
CREATE INDEX
testdb=# \di+ idx_t_bloom_bloom
 List of relations
 Schema | Name | Type | Owner | Table | Size | Description 
--------+-------------------+-------+-------+---------+--------+-------------
 public | idx_t_bloom_bloom | index | pg12 | t_bloom | 225 MB | 
(1 row)

執(zhí)行查詢

testdb=# EXPLAIN ANALYZE select * from t_bloom where id4 = 305294 and zipcode = 13266;
 QUERY PLAN 
-------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on t_bloom (cost=283084.16..283088.18 rows=1 width=69) (actual time=998.727..998.727 rows=0 loops=1)
 Recheck Cond: ((id4 = 305294) AND (zipcode = 13266))
 Rows Removed by Index Recheck: 12597
 Heap Blocks: exact=12235
 -  Bitmap Index Scan on idx_t_bloom_bloom (cost=0.00..283084.16 rows=1 width=0) (actual time=234.893..234.893 rows=12597
 loops=1)
 Index Cond: ((id4 = 305294) AND (zipcode = 13266))
 Planning Time: 31.482 ms
 Execution Time: 998.975 ms
(8 rows)
testdb=# EXPLAIN ANALYZE select * from t_bloom where id5 = 241326 and id6 = 354198;
 QUERY PLAN 
-------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on t_bloom (cost=283084.16..283088.18 rows=1 width=69) (actual time=1019.621..1019.621 rows=0 loops=1)
 Recheck Cond: ((id5 = 241326) AND (id6 = 354198))
 Rows Removed by Index Recheck: 13033
 Heap Blocks: exact=12633
 -  Bitmap Index Scan on idx_t_bloom_bloom (cost=0.00..283084.16 rows=1 width=0) (actual time=204.873..204.873 rows=13033
 loops=1)
 Index Cond: ((id5 = 241326) AND (id6 = 354198))
 Planning Time: 0.441 ms
 Execution Time: 1019.811 ms
(8 rows)

從執(zhí)行結(jié)果來看, 在查詢條件中沒有非前導(dǎo)列 (上例中為 id1) 的情況下多列任意組合查詢,bloom index 會優(yōu)于 btree index.

感謝各位的閱讀，以上就是“怎么使用 PostgreSQL 中的 Bloom 索引”的內(nèi)容了，經(jīng)過本文的學習后，相信大家對怎么使用 PostgreSQL 中的 Bloom 索引這一問題有了更深刻的體會，具體使用情況還需要大家實踐驗證。這里是丸趣 TV，丸趣 TV 小編將為大家推送更多相關(guān)知識點的文章，歡迎關(guān)注！

正文完