共計 7737 個字符,預計需要花費 20 分鐘才能閱讀完成。
這篇文章主要講解了“PostgreSQL 數據庫 B -Tree 索引的物理存儲結構是怎樣的”,文中的講解內容簡單清晰,易于學習與理解,下面請大家跟著丸趣 TV 小編的思路慢慢深入,一起來研究和學習“PostgreSQL 數據庫 B -Tree 索引的物理存儲結構是怎樣的”吧!
一、測試數據
我們繼續使用上一節使用的測試數據,這一次我們追加插入 1000 行的數據。
-- 為方便對比,插入數據前先查看索引元數據頁
testdb=# select * from bt_metap( pk_t_index
magic | version | root | level | fastroot | fastlevel | oldest_xact | last_cleanup_num_tuples
--------+---------+------+-------+----------+-----------+-------------+-------------------------
340322 | 3 | 1 | 0 | 1 | 0 | 0 | -1
(1 row)
testdb=# do $$
testdb$# begin
testdb$# for i in 19..1020 loop
testdb$# insert into t_index (id, c1, c2) values (i, # ||i|| # , # ||i|| #
testdb$# end loop;
testdb$# end $$;
testdb=# select count(*) from t_index;
count
-------
1008
(1 row)
二、索引存儲結構
插入數據后,重新查看索引元數據頁信息:
testdb=# select * from bt_metap( pk_t_index
magic | version | root | level | fastroot | fastlevel | oldest_xact | last_cleanup_num_tuples
--------+---------+------+-------+----------+-----------+-------------+-------------------------
340322 | 3 | 3 | 1 | 3 | 1 | 0 | -1
(1 row)
root block 從原來的 block 1 變為 block 3,查看 block 3 的的 Special space:
testdb=# select * from bt_page_stats(pk_t_index ,3);
blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------
3 | r | 3 | 0 | 13 | 8192 | 8096 | 0 | 0 | 1 | 2
(1 row)
type=r,表示 root index block,這個 block 有 3 個 index entries(live_items=3,該 index block 只是 root block(btpo_flags=BTP_ROOT)。下面我們來看看這個 block 中的 index entries:
testdb=# select * from bt_page_items(pk_t_index ,3);
itemoffset | ctid | itemlen | nulls | vars | data
------------+---------+---------+-------+------+-------------------------
1 | (1,0) | 8 | f | f |
2 | (2,53) | 16 | f | f | 7b 01 00 00 00 00 00 00
3 | (4,105) | 16 | f | f | e9 02 00 00 00 00 00 00
(3 rows)
root/branch index block 存儲的是指向其他 index block 的指針。第 1 行,index entries 指向第 1 個 index block,由于該 block 沒有 left block,因此,itemlen 只有 8 個字節,數據范圍為 1 -\x0000017b(十進制值為 379);第 2 行,index entries 指向第 2 個 index block,數據范圍為 380-\x000002e9(745);第 3 行,index entries 指向第 4 個 index block,數據范圍為大于 745 的值。
這里有個疑惑,正常來說,root index block 中的 entries 應指向 index block,但 ctid 的值(2,53)和(4,105)指向的卻是 Heap Table Block,PG11 Beta2 的 Bug?
In a B-tree leaf page, ctid points to a heap tuple. In an internal page, the block number part of ctid points to another page in the index itself, while the offset part (the second number) is ignored and is usually 1.
testdb=# select * from heap_page_items(get_raw_page( t_index ,2)) where t_ctid = (2,53)
lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid | t_data
----+--------+----------+--------+---------+--------+----------+--------+-------------+------------+--------+--------+-------+------------------------------------------
53 | 5648 | 1 | 43 | 1612755 | 0 | 360 | (2,53) | 3 | 2306 | 24 | | | \x7b0100001323333739232020200d2333373923
(1 row)
testdb=# select * from heap_page_items(get_raw_page( t_index ,4)) where t_ctid = (4,105)
lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid | t_data
-----+--------+----------+--------+---------+--------+----------+---------+-------------+------------+--------+--------+-------+------------------------------------------
105 | 3152 | 1 | 43 | 1612755 | 0 | 726 | (4,105) | 3 | 2306 | 24 | | | \xe90200001323373435232020200d2337343523
(1 row)
回到正題,我們首先看看 index block 1 的相關數據:
testdb=# select * from bt_page_stats(pk_t_index ,1);
blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------
1 | l | 367 | 0 | 16 | 8192 | 808 | 0 | 2 | 0 | 1
(1 row)
testdb=# select * from bt_page_items(pk_t_index ,1) limit 10;
itemoffset | ctid | itemlen | nulls | vars | data
------------+--------+---------+-------+------+-------------------------
1 | (2,53) | 16 | f | f | 7b 01 00 00 00 00 00 00
2 | (0,1) | 16 | f | f | 02 00 00 00 00 00 00 00
3 | (0,2) | 16 | f | f | 04 00 00 00 00 00 00 00
4 | (0,3) | 16 | f | f | 08 00 00 00 00 00 00 00
5 | (0,4) | 16 | f | f | 10 00 00 00 00 00 00 00
6 | (0,6) | 16 | f | f | 11 00 00 00 00 00 00 00
7 | (0,5) | 16 | f | f | 12 00 00 00 00 00 00 00
8 | (0,8) | 16 | f | f | 13 00 00 00 00 00 00 00
9 | (0,9) | 16 | f | f | 14 00 00 00 00 00 00 00
10 | (0,10) | 16 | f | f | 15 00 00 00 00 00 00 00
(10 rows)
第 1 個 block 的 Special space,其中 type=l,表示 leaf index block,btpo_flags=BTP_LEAF 表示該 block 僅僅為 leaf index block,block 的 index entries 指向 heap table。同時,這個 block 里面有 367 個 items,右邊兄弟 block 號是 2(btpo_next)。
值得注意到,index entries 的第 1 個條目,是最大值 \x017b,第 2 個條目才是最小值,接下來的條目是按順序存儲的其他值。源碼的 README(src/backend/access/nbtree/README)里面有解釋:
On a page that is not rightmost in its tree level, the high key is
kept in the page s first item, and real data items start at item 2.
The link portion of the high key item goes unused. A page that is
rightmost has no high key , so data items start with the first item.
Putting the high key at the left, rather than the right, may seem odd,
but it avoids moving the high key as we add data items.
官方文檔也有相關解釋:
Note that the first item on any non-rightmost page (any page with a non-zero value in the btpo_next field) is the page s“high key”, meaning its data serves as an upper bound on all items appearing on the page, while its ctid field is meaningless. Also, on non-leaf pages, the first real data item (the first item that is not a high key) is a“minus infinity”item, with no actual value in its data field. Such an item does have a valid downlink in its ctid field, however.
下面我們再來看看 index block 2 4:
testdb=# select * from bt_page_stats(pk_t_index ,2);
blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------
2 | l | 367 | 0 | 16 | 8192 | 808 | 1 | 4 | 0 | 1
(1 row)
testdb=# select * from bt_page_items(pk_t_index ,2) limit 10;
itemoffset | ctid | itemlen | nulls | vars | data
------------+---------+---------+-------+------+-------------------------
1 | (4,105) | 16 | f | f | e9 02 00 00 00 00 00 00
2 | (2,53) | 16 | f | f | 7b 01 00 00 00 00 00 00
3 | (2,54) | 16 | f | f | 7c 01 00 00 00 00 00 00
4 | (2,55) | 16 | f | f | 7d 01 00 00 00 00 00 00
5 | (2,56) | 16 | f | f | 7e 01 00 00 00 00 00 00
6 | (2,57) | 16 | f | f | 7f 01 00 00 00 00 00 00
7 | (2,58) | 16 | f | f | 80 01 00 00 00 00 00 00
8 | (2,59) | 16 | f | f | 81 01 00 00 00 00 00 00
9 | (2,60) | 16 | f | f | 82 01 00 00 00 00 00 00
10 | (2,61) | 16 | f | f | 83 01 00 00 00 00 00 00
(10 rows)
testdb=# select * from bt_page_stats(pk_t_index ,4);
blkno | type | live_items | dead_items | avg_item_size | page_size | free_size | btpo_prev | btpo_next | btpo | btpo_flags
-------+------+------------+------------+---------------+-----------+-----------+-----------+-----------+------+------------
4 | l | 276 | 0 | 16 | 8192 | 2628 | 2 | 0 | 0 | 1
(1 row)
testdb=# select * from bt_page_items(pk_t_index ,4) limit 10;
itemoffset | ctid | itemlen | nulls | vars | data
------------+---------+---------+-------+------+-------------------------
1 | (4,105) | 16 | f | f | e9 02 00 00 00 00 00 00
2 | (4,106) | 16 | f | f | ea 02 00 00 00 00 00 00
3 | (4,107) | 16 | f | f | eb 02 00 00 00 00 00 00
4 | (4,108) | 16 | f | f | ec 02 00 00 00 00 00 00
5 | (4,109) | 16 | f | f | ed 02 00 00 00 00 00 00
6 | (4,110) | 16 | f | f | ee 02 00 00 00 00 00 00
7 | (4,111) | 16 | f | f | ef 02 00 00 00 00 00 00
8 | (4,112) | 16 | f | f | f0 02 00 00 00 00 00 00
9 | (4,113) | 16 | f | f | f1 02 00 00 00 00 00 00
10 | (4,114) | 16 | f | f | f2 02 00 00 00 00 00 00
(10 rows)
感謝各位的閱讀,以上就是“PostgreSQL 數據庫 B -Tree 索引的物理存儲結構是怎樣的”的內容了,經過本文的學習后,相信大家對 PostgreSQL 數據庫 B -Tree 索引的物理存儲結構是怎樣的這一問題有了更深刻的體會,具體使用情況還需要大家實踐驗證。這里是丸趣 TV,丸趣 TV 小編將為大家推送更多相關知識點的文章,歡迎關注!