分布式圖數(shù)據(jù)庫 Nebula Graph 中的集群快照實踐是怎樣進行的

205次閱讀

共計 7569 個字符，預計需要花費 19 分鐘才能閱讀完成。

今天就跟大家聊聊有關分布式圖數(shù)據(jù)庫 Nebula Graph 中的集群快照實踐是怎樣進行的，可能很多人都不太了解，為了讓大家更加了解，丸趣 TV 小編給大家總結了以下內(nèi)容，希望大家根據(jù)這篇文章可以有所收獲。

1.1 需求背景

圖數(shù)據(jù)庫 Nebula Graph 在生產(chǎn)環(huán)境中將擁有龐大的數(shù)據(jù)量和高頻率的業(yè)務處理，在實際的運行中將不可避免的發(fā)生人為的、硬件或業(yè)務處理錯誤的問題，某些嚴重錯誤將導致集群無法正常運行或集群中的數(shù)據(jù)失效。當集群處于無法啟動或數(shù)據(jù)失效的狀態(tài)時，重新搭建集群并重新倒入數(shù)據(jù)都將是一個繁瑣并耗時的工程。針對此問題，Nebula Graph 提供了集群 snapshot 的創(chuàng)建功能。

Snapshot 功能需要預先提供集群在某個時間點 snapshot 的創(chuàng)建功能，以備發(fā)生災難性問題時用歷史 snapshot 便捷地將集群恢復到一個可用狀態(tài)。

1.2 術語

本文主要會用到以下術語：

StorageEngine：Nebula Graph 的最小物理存儲單元，目前支持 RocksDB 和 HBase，在本文中只針對 RocksDB。

Partition：Nebula Graph 的最小邏輯存儲單元，一個 StorageEngine 可包含多個 Partition。Partition 分為 leader 和 follower 的角色，Raftex 保證了 leader 和 follower 之間的數(shù)據(jù)一致性。

GraphSpace：每個 GraphSpace 是一個獨立的業(yè)務 Graph 單元，每個 GraphSpace 有其獨立的 tag 和 edge 集合。一個 Nebula Graph 集群中可包含多個 GraphShpace。

checkpoint：針對 StorageEngine 的一個時間點上的快照，checkpoint 可以作為全量備份的一個 backup 使用。checkpoint files 是 sst files 的一個硬連接。

snapshot：本文中的 snapshot 是指 Nebula Graph 集群的某個時間點的快照，即集群中所有 StorageEngine 的 checkpoint 的集合。通過 snapshot 可以將集群恢復到某個 snapshot 創(chuàng)建時的狀態(tài)。

wal：Write-Ahead Logging，用 raftex 保證 leader 和 follower 的一致性。

2 系統(tǒng)構架 2.1 系統(tǒng)整體架構 2.2 存儲系統(tǒng)結構關系 2.3 存儲系統(tǒng)物理文件結構

[bright2star@hp-server storage]$ tree
└── nebula
 └── 1
 ├── checkpoints
 │ ├── SNAPSHOT_2019_12_04_10_54_42
 │ │ ├── data
 │ │ │ ├── 000006.sst
 │ │ │ ├── 000008.sst
 │ │ │ ├── CURRENT
 │ │ │ ├── MANIFEST-000007
 │ │ │ └── OPTIONS-000005
 │ │ └── wal
 │ │ ├── 1
 │ │ │ └── 0000000000000000233.wal
 │ │ ├── 2
 │ │ │ └── 0000000000000000233.wal
 │ │ ├── 3
 │ │ │ └── 0000000000000000233.wal
 │ │ ├── 4
 │ │ │ └── 0000000000000000233.wal
 │ │ ├── 5
 │ │ │ └── 0000000000000000233.wal
 │ │ ├── 6
 │ │ │ └── 0000000000000000233.wal
 │ │ ├── 7
 │ │ │ └── 0000000000000000233.wal
 │ │ ├── 8
 │ │ │ └── 0000000000000000233.wal
 │ │ └── 9
 │ │ └── 0000000000000000233.wal
 │ └── SNAPSHOT_2019_12_04_10_54_44
 │ ├── data
 │ │ ├── 000006.sst
 │ │ ├── 000008.sst
 │ │ ├── 000009.sst
 │ │ ├── CURRENT
 │ │ ├── MANIFEST-000007
 │ │ └── OPTIONS-000005
 │ └── wal
 │ ├── 1
 │ │ └── 0000000000000000236.wal
 │ ├── 2
 │ │ └── 0000000000000000236.wal
 │ ├── 3
 │ │ └── 0000000000000000236.wal
 │ ├── 4
 │ │ └── 0000000000000000236.wal
 │ ├── 5
 │ │ └── 0000000000000000236.wal
 │ ├── 6
 │ │ └── 0000000000000000236.wal
 │ ├── 7
 │ │ └── 0000000000000000236.wal
 │ ├── 8
 │ │ └── 0000000000000000236.wal
 │ └── 9
 │ └── 0000000000000000236.wal
 ├── data

3 處理邏輯分析 3.1 邏輯分析

Create snapshot 由
client api 或
console 觸發(fā)，
graph server 對
create snapshot 的
AST 進行解析，然后通過
meta client 將創(chuàng)建請求發(fā)送到
meta server。
meta server 接到請求后，首先會獲取所有的
active host，并創(chuàng)建
adminClient 所需的
request。通過
adminClient 將創(chuàng)建請求發(fā)送到每個
StorageEngine，StorageEngine 收到 create 請求后，會遍歷指定 space 的全部 StorageEngine，并創(chuàng)建
checkpoint，隨后對
StorageEngine 中的全部
partition 的
wal 做 hardlink。在創(chuàng)建 checkpoint 和 wal hardlink 時，因為已經(jīng)提前向所有 leader partition 發(fā)送了 write blocking 請求，所以此時數(shù)據(jù)庫是只讀狀態(tài)的。

因為 snapshot 的名稱是由系統(tǒng)的 timestamp 自動生成，所以不必擔心 snapshot 的重名問題。如果創(chuàng)建了不必要的 snapshot，可以通過 drop snapshot 命令刪除已創(chuàng)建的 snapshot。

3.2 Create Snapshot3.3 reate Checkpoint4 關鍵代碼實現(xiàn) 4.1 Create Snapshot

folly::Future Status  AdminClient::createSnapshot(GraphSpaceID spaceId, const std::string  name) {
 //  獲取所有 storage engine 的 host
 auto allHosts = ActiveHostsMan::getActiveHosts(kv_);
 storage::cpp2::CreateCPRequest req;
 //  指定 spaceId，目前是對所有 space 做 checkpoint，list spaces  工作已在調用函數(shù)中執(zhí)行。 req.set_space_id(spaceId);
 //  指定  snapshot name，已有 meta server 根據(jù)時間戳產(chǎn)生。 //  例如：SNAPSHOT_2019_12_04_10_54_44
 req.set_name(name);
 folly::Promise Status  pro;
 auto f = pro.getFuture();
 //  通過 getResponse 接口發(fā)送請求到所有的 storage engine.
 getResponse(allHosts, 0, std::move(req), [] (auto client, auto request) { return client- future_createCheckpoint(request);
 }, 0, std::move(pro), 1 /*The snapshot operation only needs to be retried twice*/);
 return f;
}

4.2 Create Checkpoint

ResultCode NebulaStore::createCheckpoint(GraphSpaceID spaceId, const std::string  name) { auto spaceRet = space(spaceId);
 if (!ok(spaceRet)) { return error(spaceRet);
 }
 auto space = nebula::value(spaceRet);
 //  遍歷屬于本 space 中的所有 StorageEngine
 for (auto  engine : space- engines_) {
 //  首先對 StorageEngine 做 checkpoint
 auto code = engine- createCheckpoint(name);
 if (code != ResultCode::SUCCEEDED) {
 return code;
 }
 //  然后對本 StorageEngine 中的所有 partition 的 last wal 做 hardlink
 auto parts = engine- allParts();
 for (auto  part : parts) { auto ret = this- part(spaceId, part);
 if (!ok(ret)) { LOG(ERROR)    Part not found. space :     spaceId     Part :     part;
 return error(ret);
 }
 auto walPath = folly::stringPrintf( %s/checkpoints/%s/wal/%d ,
 engine- getDataRoot(), name.c_str(), part);
 auto p = nebula::value(ret);
 if (!p- linkCurrentWAL(walPath.data())) {
 return ResultCode::ERR_CHECKPOINT_ERROR;
 }
 }
 }
 return ResultCode::SUCCEEDED;
}

5 用戶使用幫助 5.1 CREATE SNAPSHOT

CREATE SNAPSHOT 即對整個集群創(chuàng)建當前時間點的快照，snapshot 名稱由 meta server 的 timestamp 組成。

在創(chuàng)建過程中可能會創(chuàng)建失敗，當前版本不支持創(chuàng)建失敗的垃圾回收的自動功能，后續(xù)將計劃在 metaServer 中開發(fā) cluster checker 的功能，將通過異步線程檢查集群狀態(tài)，并自動回收 snapshot 創(chuàng)建失敗的垃圾文件。

當前版本如果 snapshot 創(chuàng)建失敗，必須通過
DROP SNAPSHOT 命令清除無效的 snapshot。

當前版本不支持對指定的 space 做 snapshot，當執(zhí)行 CREATE SNAPSHOT 后，將對集群中的所有 space 創(chuàng)建快照。
CREATE SNAPSHOT 語法：

CREATE SNAPSHOT

以下為筆者創(chuàng)建 3 個 snapshot 的例子：

(user@127.0.0.1) [default_space]  create snapshot;
Execution succeeded (Time spent: 28211/28838 us)
(user@127.0.0.1) [default_space]  create snapshot;
Execution succeeded (Time spent: 22892/23923 us)
(user@127.0.0.1) [default_space]  create snapshot;
Execution succeeded (Time spent: 18575/19168 us)

我們用 5.3 提及的
SHOW SNAPSHOTS 命令看下現(xiàn)在有的快照

(user@127.0.0.1) [default_space]  show snapshots;
===========================================================
| Name | Status | Hosts |
===========================================================
| SNAPSHOT_2019_12_04_10_54_36 | VALID | 127.0.0.1:77833 |
-----------------------------------------------------------
| SNAPSHOT_2019_12_04_10_54_42 | VALID | 127.0.0.1:77833 |
-----------------------------------------------------------
| SNAPSHOT_2019_12_04_10_54_44 | VALID | 127.0.0.1:77833 |
-----------------------------------------------------------
Got 3 rows (Time spent: 907/1495 us)

從上
SNAPSHOT_2019_12_04_10_54_36 可見 snapshot 名同 timestamp 有關。

5.2 DROP SNAPSHOT

DROP SNAPSHOT 即刪除指定名稱的 snapshot，可以通過
SHOW SNAPSHOTS 命令獲取 snapshot 的名稱，DROP SNAPSHOT 既可以刪除有效的 snapshot，也可以刪除創(chuàng)建失敗的 snapshot。

語法：

DROP SNAPSHOT name

筆者刪除了 5.1 成功創(chuàng)建的 snapshot
SNAPSHOT_2019_12_04_10_54_36 ，并用 SHOW SNAPSHOTS 命令查看現(xiàn)有的 snapshot。

(user@127.0.0.1) [default_space]  drop snapshot SNAPSHOT_2019_12_04_10_54_36;
Execution succeeded (Time spent: 6188/7348 us)
(user@127.0.0.1) [default_space]  show snapshots;
===========================================================
| Name | Status | Hosts |
===========================================================
| SNAPSHOT_2019_12_04_10_54_42 | VALID | 127.0.0.1:77833 |
-----------------------------------------------------------
| SNAPSHOT_2019_12_04_10_54_44 | VALID | 127.0.0.1:77833 |
-----------------------------------------------------------
Got 2 rows (Time spent: 1097/1721 us)

5.3 SHOW SNAPSHOTS

SHOW SNAPSHOTS 可查看集群中所有的 snapshot，可以通過
SHOW SNAPSHOTS 命令查看其狀態(tài)（VALID 或 INVALID）、名稱、和創(chuàng)建 snapshot 時所有 storage Server 的 ip 地址。
語法：

SHOW SNAPSHOTS

以下為一個小示例：

(user@127.0.0.1) [default_space]  show snapshots;
===========================================================
| Name | Status | Hosts |
===========================================================
| SNAPSHOT_2019_12_04_10_54_36 | VALID | 127.0.0.1:77833 |
-----------------------------------------------------------
| SNAPSHOT_2019_12_04_10_54_42 | VALID | 127.0.0.1:77833 |
-----------------------------------------------------------
| SNAPSHOT_2019_12_04_10_54_44 | VALID | 127.0.0.1:77833 |
-----------------------------------------------------------
Got 3 rows (Time spent: 907/1495 us)

6 注意事項

當系統(tǒng)結構發(fā)生變化后，最好立刻 create snapshot，例如 add host、drop host、create space、drop space、balance 等。

當前版本暫未提供用戶指定 snapshot 路徑的功能，snapshot 將默認創(chuàng)建在 data_path/nebula 目錄下。

當前版本暫未提供 snapshot 的恢復功能，需要用戶根據(jù)實際的生產(chǎn)環(huán)境編寫 shell 腳本實現(xiàn)。實現(xiàn)邏輯也比較簡單，拷貝各 engineServer 的 snapshot 到指定的文件夾下，并將此文件夾設置為 data_path，啟動集群即可。

看完上述內(nèi)容，你們對分布式圖數(shù)據(jù)庫 Nebula Graph 中的集群快照實踐是怎樣進行的有進一步的了解嗎？如果還想了解更多知識或者相關內(nèi)容，請關注丸趣 TV 行業(yè)資訊頻道，感謝大家的支持。

正文完