共計 9355 個字符,預計需要花費 24 分鐘才能閱讀完成。
本篇內容介紹了“PostgreSQL 中 fsm_search 函數有什么作用”的有關知識,在實際案例的操作過程中,不少人都會遇到這樣的困境,接下來就讓丸趣 TV 小編帶領大家學習一下如何處理這些情況吧!希望大家仔細閱讀,能夠學有所成!
一、數據結構
宏定義
包括 FSM 頁面的葉子節點數 / 非葉子節點數 /FSM 樹深度等等.
#define MaxFSMRequestSize MaxHeapTupleSize
#define MaxHeapTupleSize (BLCKSZ - MAXALIGN(SizeOfPageHeaderData + sizeof(ItemIdData)))
#define FSM_CAT_STEP (BLCKSZ / FSM_CATEGORIES)
#define FSM_CATEGORIES 256
// 塊大小為 8K 則 FSM_CAT_STEP = 32
#define SlotsPerFSMPage LeafNodesPerPage
#define LeafNodesPerPage (NodesPerPage - NonLeafNodesPerPage) = 8156 - 4095 = 4061
#define NodesPerPage (BLCKSZ - MAXALIGN(SizeOfPageHeaderData) - \
offsetof(FSMPageData, fp_nodes)) = 8192 - 32 - 4 = 8156
#define NonLeafNodesPerPage (BLCKSZ / 2 - 1) = 4095
* Depth of the on-disk tree. We need to be able to address 2^32-1 blocks,
* and 1626 is the smallest number that satisfies X^3 = 2^32-1. Likewise,
* 216 is the smallest number that satisfies X^4 = 2^32-1. In practice,
* this means that 4096 bytes is the smallest BLCKSZ that we can get away
* with a 3-level tree, and 512 is the smallest we support.
* 存儲在磁盤上的樹深度.
* 我們需要為 2^32 - 1 個塊定位尋址,1626 是可以滿足 X^3 = 2^32 - 1 的最小數字.
* 另外,216 是可以滿足 X^4 = 2^32 - 1 的最小數字.
* 在實踐中, 這意味著 4096 字節是三層數可以支撐的最小 BLCKSZ,512 是最小可支持的.
*/
#define FSM_TREE_DEPTH ((SlotsPerFSMPage = 1626) ? 3 : 4)
FSMAddress
內部的 FSM 處理過程以邏輯地址 scheme 的方式工作, 樹的每一個層次都可以認為是一個獨立的地址文件.
/*
* The internal FSM routines work on a logical addressing scheme. Each
* level of the tree can be thought of as a separately addressable file.
* 內部的 FSM 處理過程工作在一個邏輯地址 scheme 上.
* 樹的每一個層次都可以認為是一個獨立的地址文件.
*/
typedef struct
// 層次
int level; /* level */
// 該層次內的頁編號
int logpageno; /* page number within the level */
} FSMAddress;
/* Address of the root page. */
// 根頁地址
static const FSMAddress FSM_ROOT_ADDRESS = {FSM_ROOT_LEVEL, 0};
FSMPage
FSM page 數據結構. 詳細可參看 src/backend/storage/freespace/README.
/*
* Structure of a FSM page. See src/backend/storage/freespace/README for
* details.
* FSM page 數據結構. 詳細可參看 src/backend/storage/freespace/README.
*/
typedef struct
/*
* fsm_search_avail() tries to spread the load of multiple backends by
* returning different pages to different backends in a round-robin
* fashion. fp_next_slot points to the next slot to be returned (assuming
* there s enough space on it for the request). It s defined as an int,
* because it s updated without an exclusive lock. uint16 would be more
* appropriate, but int is more likely to be atomically
* fetchable/storable.
* fsm_search_avail()函數嘗試通過在一輪循環中返回不同的頁面到不同的后臺進程,
* 從而分散在后臺進程上分散負載.
* 該字段因為無需獨占鎖, 因此定義為整型.
* unit16 可能會更合適, 但整型看起來更適合于原子提取和存儲.
*/
int fp_next_slot;
/*
* fp_nodes contains the binary tree, stored in array. The first
* NonLeafNodesPerPage elements are upper nodes, and the following
* LeafNodesPerPage elements are leaf nodes. Unused nodes are zero.
* fp_nodes 以數組的形式存儲二叉樹.
* 第一個 NonLeafNodesPerPage 元素是上一層的節點, 接下來的 LeafNodesPerPage 元素是葉子節點.
* 未使用的節點為 0.
*/
uint8 fp_nodes[FLEXIBLE_ARRAY_MEMBER];
} FSMPageData;
typedef FSMPageData *FSMPage;
FSMLocalMap
對于小表, 不需要創建 FSM 來存儲空間信息, 使用本地的內存映射信息.
/* Either already tried, or beyond the end of the relation */
// 已嘗試或者已在表的末尾之后
#define FSM_LOCAL_NOT_AVAIL 0x00
/* Available to try */
// 可用于嘗試
#define FSM_LOCAL_AVAIL 0x01
* For small relations, we don t create FSM to save space, instead we use
* local in-memory map of pages to try. To locate free space, we simply try
* pages directly without knowing ahead of time how much free space they have.
* 對于小表, 不需要創建 FSM 來存儲空間信息, 使用本地的內存映射信息.
* 為了定位空閑空間, 我們不需要知道他們有多少空閑空間而是直接簡單的對 page 進行嘗試.
*
* Note that this map is used to the find the block with required free space
* for any given relation. We clear this map when we have found a block with
* enough free space, when we extend the relation, or on transaction abort.
* See src/backend/storage/freespace/README for further details.
* 注意這個 map 用于搜索給定表的請求空閑空間.
* 在找到有足夠空閑空間的 block/ 擴展了 relation/ 在事務回滾時, 則清除這個 map 的信息.
* 詳細可查看 src/backend/storage/freespace/README.
*/
typedef struct
BlockNumber nblocks;// 塊數
uint8 map[HEAP_FSM_CREATION_THRESHOLD];// 數組
} FSMLocalMap;
static FSMLocalMap fsm_local_map =
0,
{
FSM_LOCAL_NOT_AVAIL
}
#define FSM_LOCAL_MAP_EXISTS (fsm_local_map.nblocks 0)
通用例程
包括獲取左子節點 / 右子節點 / 父節點等
/* Macros to navigate the tree within a page. Root has index zero. */
// 在 page 中遍歷樹.Root 編號為 0
#define leftchild(x) (2 * (x) + 1)
#define rightchild(x) (2 * (x) + 2)
#define parentof(x) (((x) - 1) / 2)
* Find right neighbor of x, wrapping around within the level
* 搜索 x 右邊的鄰居, 如需要在同一個層次上需回卷
*/
static int
rightneighbor(int x)
/*
* Move right. This might wrap around, stepping to the leftmost node at
* the next level.
* 移到右邊. 這可能會引起回卷, 調到下一個層次最左邊的節點上.
*/
x++;
/*
* Check if we stepped to the leftmost node at next level, and correct if
* so. The leftmost nodes at each level are numbered x = 2^level - 1, so
* check if (x + 1) is a power of two, using a standard
* twos-complement-arithmetic trick.
* 檢查是否跳轉到下一個層次最左邊的節點上, 如是則修正 x.
* 每一個層次上最左邊的節點編號為 x = 2^level - 1,
* 因此檢查 (x+1) 是否為 2 的冪, 使用標準的 twos-complement-arithmetic 技巧即可.
*/
if (((x + 1) x) == 0)// 有符號整型, 全 1 為 0
x = parentof(x);
return x;
* Returns the physical block number of a FSM page
* 返回 FSM page 的物理塊號
*/
計算公式:
To find the physical block # corresponding to leaf page n, we need to
count the number of leaf and upper-level pages preceding page n.
This turns out to be
y = n + (n / F + 1) + (n / F^2 + 1) + ... + 1
where F is the fanout . The first term n is the number
of preceding leaf pages, the second term is the number of pages at level 1,
and so forth.
static BlockNumber
fsm_logical_to_physical(FSMAddress addr)
BlockNumber pages;// 塊號
int leafno;// 頁號
int l;// 臨時變量
/*
* Calculate the logical page number of the first leaf page below the
* given page.
* 在給定的 page 下, 計算第一個葉子頁面的邏輯頁號
*/
leafno = addr.logpageno;
for (l = 0; l addr.level; l++)
leafno *= SlotsPerFSMPage;
/* Count upper level nodes required to address the leaf page */
// 統計用于定位葉子頁面的上層節點數
pages = 0;
for (l = 0; l FSM_TREE_DEPTH; l++)
{
pages += leafno + 1;
leafno /= SlotsPerFSMPage;
}
/*
* If the page we were asked for wasn t at the bottom level, subtract the
* additional lower level pages we counted above.
* 如果請求的頁面不在底層, 減去上面技術的額外的底層頁面數.
*/
pages -= addr.level;
/* Turn the page count into 0-based block number */
// 計數從 0 開始(減一)
return pages - 1;
/*
* Return the FSM location corresponding to given heap block.
* 返回給定堆 block 的 FSM 位置.
*/
//addr = fsm_get_location(oldPage, slot);
static FSMAddress
fsm_get_location(BlockNumber heapblk, uint16 *slot)
FSMAddress addr;
addr.level = FSM_BOTTOM_LEVEL;
//#define SlotsPerFSMPage LeafNodesPerPage
//#define LeafNodesPerPage (NodesPerPage - NonLeafNodesPerPage) = 8156 - 4095 = 4061
//#define NodesPerPage (BLCKSZ - MAXALIGN(SizeOfPageHeaderData) - \
offsetof(FSMPageData, fp_nodes)) = 8192 - 32 - 4 = 8156
//#define NonLeafNodesPerPage (BLCKSZ / 2 - 1) = 4095
addr.logpageno = heapblk / SlotsPerFSMPage;
*slot = heapblk % SlotsPerFSMPage;
return addr;
}
二、源碼解讀
fsm_search 函數搜索 FSM, 找到有足夠空閑空間 (min_cat) 的堆 page.
/*
* Search the tree for a heap page with at least min_cat of free space
* 搜索 FSM, 找到有足夠空閑空間 (min_cat) 的堆 page
*/
//return fsm_search(rel, search_cat);
static BlockNumber
fsm_search(Relation rel, uint8 min_cat)
int restarts = 0;
FSMAddress addr = FSM_ROOT_ADDRESS;
for (;;)
{
//--------- 循環
int slot;
Buffer buf;
uint8 max_avail = 0;
/* Read the FSM page. */
// 讀取 FSM page
buf = fsm_readbuf(rel, addr, false);
/* Search within the page */
// 頁內搜索
if (BufferIsValid(buf))
{ LockBuffer(buf, BUFFER_LOCK_SHARE);
// 搜索可用的 slot
slot = fsm_search_avail(buf, min_cat,
(addr.level == FSM_BOTTOM_LEVEL),
false);
if (slot == -1)
// 獲取最大可用空間
max_avail = fsm_get_max_avail(BufferGetPage(buf));
UnlockReleaseBuffer(buf);
}
else
//buffer 無效, 則設置為 -1
slot = -1;
if (slot != -1)
{
/*
* Descend the tree, or return the found block if we re at the
* bottom.
* 如在樹的底部, 則返回找到的塊.
*/
if (addr.level == FSM_BOTTOM_LEVEL)
return fsm_get_heap_blk(addr, slot);
addr = fsm_get_child(addr, slot);
}
else if (addr.level == FSM_ROOT_LEVEL)
{
/*
* At the root, failure means there s no page with enough free
* space in the FSM. Give up.
* 處于根節點, 失敗意味著 FSM 中沒有足夠空閑空間的頁面存在, 放棄.
*/
return InvalidBlockNumber;
}
else
{
uint16 parentslot;
FSMAddress parent;
/*
* At lower level, failure can happen if the value in the upper-
* level node didn t reflect the value on the lower page. Update
* the upper node, to avoid falling into the same trap again, and
* start over.
* 在低層上, 如果上層節點沒有反映更低層頁面的值則會出現失敗.
* 更新高層節點, 避免重復掉入同一個陷阱, 重新開始.
*
* There s a race condition here, if another backend updates this
* page right after we release it, and gets the lock on the parent
* page before us. We ll then update the parent page with the now
* stale information we had. It s OK, because it should happen
* rarely, and will be fixed by the next vacuum.
* 在我們釋放后, 另外的后臺進程更新這個頁面同時在我們之前鎖定了父節點的話, 會存在條件競爭.
* 然后我們使用現有已知穩定的信息更新父頁面.
* 如 OK, 因為這種很少出現, 那么會在下一個 vacuum 中修復此問題.
*/
parent = fsm_get_parent(addr, parentslot);
fsm_set_and_search(rel, parent, parentslot, max_avail, 0);
/*
* If the upper pages are badly out of date, we might need to loop
* quite a few times, updating them as we go. Any inconsistencies
* should eventually be corrected and the loop should end. Looping
* indefinitely is nevertheless scary, so provide an emergency
* valve.
* 如果上層頁面過舊, 可能需要循環很多次, 更新上層頁面信息.
* 不一致性會被周期性的糾正, 循環會停止.
* 但無限循環是很可怕的, 因此設置閾值, 超過此閾值則退出循環.
*/
if (restarts++ 10000)
return InvalidBlockNumber;
/* Start search all over from the root */
// 從 root 開始搜索
addr = FSM_ROOT_ADDRESS;
}
}
}
“PostgreSQL 中 fsm_search 函數有什么作用”的內容就介紹到這里了,感謝大家的閱讀。如果想了解更多行業相關的知識可以關注丸趣 TV 網站,丸趣 TV 小編將為大家輸出更多高質量的實用文章!
正文完
發表至: 數據庫
2023-07-24