Greenplum怎么創建表的分布鍵

161次閱讀

共計 7481 個字符，預計需要花費 19 分鐘才能閱讀完成。

本篇內容介紹了“Greenplum 怎么創建表的分布鍵”的有關知識，在實際案例的操作過程中，不少人都會遇到這樣的困境，接下來就讓丸趣 TV 小編帶領大家學習一下如何處理這些情況吧！希望大家仔細閱讀，能夠學有所成！

Greenplum 創建表 – 分布鍵

Greenplum 是分布式系統，創建表時需要指定分布鍵（創建表需要 CREATEDBA 權限），目的在于將數據平均分布到各個 segment。選擇分布鍵非常重要，選擇錯了會導致數據不唯一，更嚴重的是會造成 SQL 性能急劇下降。

Greenplum 有兩種分布策略：

1、hash 分布。

Greenplum 默認使用 hash 分布策略。該策略可選一個或者多個列作為分布鍵（distribution key，簡稱 DK）。分布鍵做 hash 算法來確認數據存放到對應的 segment 上。相同分布鍵值會 hash 到相同的 segment 上。表上最好有唯一鍵或者主鍵，這樣能保證數據均衡分不到各個 segment 上。語法，distributed by。

如果沒有主鍵或者唯一鍵，默認選擇第一列作為分布鍵。增加主鍵

2、隨機（randomly）分布。

數據會被隨機分不到 segment 上，相同記錄可能會存放在不同的 segment 上。隨機分布可以保證數據平均，但是 Greenplum 沒有跨節點的唯一鍵約束數據，所以無法保證數據唯一。基于唯一性和性能考慮，推薦使用 hash 分布，性能部分會另開一篇文檔詳細介紹。語法，distributed randomly。

一、hash 分布鍵

創建表，未指定分布列、分布類型，默認創建 hash 分布表，把第一列 ID 字段作為了分布鍵。

testDB=# create table t_hash(id int,name varchar(50)) distributed by (id);

CREATE TABLE

testDB=#

testDB=# \d t_hash

Table public.t_hash

Column | Type | Modifiers

——–+———————–+———–

id | integer |

name | character varying(50) |

Distributed by: (id)

添加主鍵后，主鍵升級為分布鍵替代了 id 列。

testDB=# alter table t_hash add primary key (name);

NOTICE: updating distribution policy to match new primary key

NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index t_hash_pkey for table t_hash

ALTER TABLE

testDB=# \d t_hash

Table public.t_hash

Column | Type | Modifiers

——–+———————–+———–

id | integer |

name | character varying(50) | not null

Indexes:

t_hash_pkey PRIMARY KEY, btree (name)

Distributed by: (name)

驗證 hash 分布表可實現主鍵或者唯一鍵值的唯一性

testDB=# insert into t_hash values(1, szlsd1

INSERT 0 1

testDB=#

testDB=# insert into t_hash values(2, szlsd1

ERROR: duplicate key violates unique constraint t_hash_pkey (seg2 gp-s3:40000 pid=3855)

另外，主鍵列上依然能夠創建唯一鍵

testDB=# create unique index u_id on t_hash(name);

CREATE INDEX

testDB=#

testDB=# \d t_hash

Table public.t_hash

Column | Type | Modifiers

——–+———————–+———–

id | integer |

name | character varying(50) | not null

Indexes:

t_hash_pkey PRIMARY KEY, btree (name)

u_id UNIQUE, btree (name)

Distributed by: (name)

但是，非主鍵列無法單獨創建唯一索引，想創建的話必須包含多有分布鍵列

testDB=# create unique index uk_id on t_hash(id);

ERROR: UNIQUE index must contain all columns in the distribution key of relation t_hash

testDB=# create unique index uk_id on t_hash(id,name);

CREATE INDEX

testDB=# \d t_hash

Table public.t_hash

Column | Type | Modifiers

——–+———————–+———–

id | integer |

name | character varying(50) | not null

Indexes:

t_hash_pkey PRIMARY KEY, btree (name)

uk_id UNIQUE, btree (id, name)

Distributed by: (name)

刪除主鍵后，原 hash 分布鍵依然不變。

testDB=# alter table t_hash drop constraint t_hash_pkey;

ALTER TABLE

testDB=# \d t_hash

Table public.t_hash

Column | Type | Modifiers

——–+———————–+———–

id | integer |

name | character varying(50) | not null

Distributed by: (name)

當分布鍵不是主鍵或者唯一鍵時，我們來驗證分布鍵的相同值落在一個 segment 的結論。

下面的實驗，name 列是分布鍵，我們插入相同的 name 值，可以看到 7 條記錄都落在了 2 號 segment 節點中。

testDB=# insert into t_hash values(1, szlsd

INSERT 0 1

testDB=# insert into t_hash values(2, szlsd

INSERT 0 1

testDB=# insert into t_hash values(3, szlsd

INSERT 0 1

testDB=# insert into t_hash values(4, szlsd

INSERT 0 1

testDB=# insert into t_hash values(5, szlsd

INSERT 0 1

testDB=# insert into t_hash values(6, szlsd

INSERT 0 1

testDB=#

testDB=# select gp_segment_id,count(*) from t_hash group by gp_segment_id;

gp_segment_id | count

—————+——-

2 | 7

(1 row)

二、隨機分布鍵

創建隨機分布表需加 distributed randomly 關鍵字，具體使用哪列作為分布鍵不得而知。

testDB=# create table t_random(id int ,name varchar(100)) distributed randomly;

CREATE TABLE

testDB=#

testDB=# \d t_random

Table public.t_random

Column | Type | Modifiers

——–+————————+———–

id | integer |

name | character varying(100) |

Distributed randomly

驗證主鍵 / 唯一鍵的唯一性，可以看到隨機分布表不能創建主鍵和唯一鍵

testDB=# alter table t_random add primary key (id,name);

ERROR: PRIMARY KEY and DISTRIBUTED RANDOMLY are incompatible

testDB=#

testDB=# create unique index uk_r_id on t_random(id);

ERROR: UNIQUE and DISTRIBUTED RANDOMLY are incompatible

testDB=#

從實驗中可以看出無法實現數據的唯一性。并且，數據插入隨機分布表，并不是輪詢插入，實驗中共有 3 個 segment，但是在 1 號插入 3 條記錄，在 2 號 segment 節點插入 2 條記錄后，才在 0 號 segment 中插入數據。隨機分布表如何實現數據平均分配不得而知。這個實驗也驗證了隨機分布表的相同值分布在不同 segment 的結論。

testDB=# insert into t_random values(1, szlsd3

INSERT 0 1

testDB=# select gp_segment_id,count(*) from t_random group by gp_segment_id;

gp_segment_id | count

—————+——-

1 | 1

(1 row)

testDB=#

testDB=# insert into t_random values(1, szlsd3

INSERT 0 1

testDB=# select gp_segment_id,count(*) from t_random group by gp_segment_id;

gp_segment_id | count

—————+——-

2 | 1

1 | 1

(2 rows)

testDB=# insert into t_random values(1, szlsd3

INSERT 0 1

testDB=# select gp_segment_id,count(*) from t_random group by gp_segment_id;

gp_segment_id | count

—————+——-

2 | 1

1 | 2

(2 rows)

testDB=# insert into t_random values(1, szlsd3

INSERT 0 1

testDB=# select gp_segment_id,count(*) from t_random group by gp_segment_id;

gp_segment_id | count

—————+——-

2 | 2

1 | 2

(2 rows)

testDB=# insert into t_random values(1, szlsd3

INSERT 0 1

testDB=# select gp_segment_id,count(*) from t_random group by gp_segment_id;

gp_segment_id | count

—————+——-

2 | 2

1 | 3

(2 rows)

testDB=# insert into t_random values(1, szlsd3

INSERT 0 1

testDB=# select gp_segment_id,count(*) from t_random group by gp_segment_id;

gp_segment_id | count

—————+——-

2 | 2

1 | 3

0 | 1

(3 rows)

三、CTAS 繼承原表分布鍵

Greenplum 中有兩種 CTAS 語法，無論哪種語法，都默認繼承原表的分布鍵。但是，不會繼承表的一些特殊屬性，如主鍵、唯一鍵、APPENDONLY、COMPRESSTYPE（壓縮）等。

testDB=# \d t_hash;

Table public.t_hash

Column | Type | Modifiers

——–+———————–+———–

id | integer |

name | character varying(50) | not null

Indexes:

t_hash_pkey PRIMARY KEY, btree (name)

uk_id UNIQUE, btree (id, name)

Distributed by: (name)

testDB=#

testDB=# create table t_hash_1 as select * from t_hash;

NOTICE: Table doesn t have DISTRIBUTED BY clause — Using column(s) named name as the Greenplum Database data distribution key for this table.

HINT: The DISTRIBUTED BY clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.

SELECT 0

testDB=# \d t_hash_1

Table public.t_hash_1

Column | Type | Modifiers

——–+———————–+———–

id | integer |

name | character varying(50) |

Distributed by: (name)

testDB=#

testDB=# create table t_hash_2 (like t_hash);

NOTICE: Table doesn t have distributed by clause, defaulting to distribution columns from LIKE table

CREATE TABLE

testDB=# \d t_hash_2

Table public.t_hash_2

Column | Type | Modifiers

——–+———————–+———–

id | integer |

name | character varying(50) | not null

Distributed by: (name)

如果 CTAS 創建表改變分布鍵，加上 distributed by 即可。

testDB=# create table t_hash_3 as select * from t_hash distributed by (id);

SELECT 0

testDB=#

testDB=# \d t_hash_3

Table public.t_hash_3

Column | Type | Modifiers

——–+———————–+———–

id | integer |

name | character varying(50) |

Distributed by: (id)

testDB=#

testDB=# create table t_hash_4 (like t_hash) distributed by (id);

CREATE TABLE

testDB=#

testDB=# \d t_hash5

Did not find any relation named t_hash5 .

testDB=# \d t_hash_4

Table public.t_hash_4

Column | Type | Modifiers

——–+———————–+———–

id | integer |

name | character varying(50) | not null

Distributed by: (id)

CTAS 時，randomly 隨機分布鍵要特別注意，一定要加上 distributed randomly，不然原表是 hash 分布鍵，CTAS 新表則是隨機分布鍵。

testDB=# \d t_random

Table public.t_random

Column | Type | Modifiers

——–+————————+———–

id | integer |

name | character varying(100) |

Distributed randomly

testDB=#

testDB=# \d t_random_1

Table public.t_random_1

Column | Type | Modifiers

——–+————————+———–

id | integer |

name | character varying(100) |

Distributed by: (id)

testDB=# create table t_random_2 as select * from t_random distributed randomly;

SELECT 7

testDB=#

testDB=# \d t_random_2

Table public.t_random_2

Column | Type | Modifiers

——–+————————+———–

id | integer |

name | character varying(100) |

Distributed randomly

“Greenplum 怎么創建表的分布鍵”的內容就介紹到這里了，感謝大家的閱讀。如果想了解更多行業相關的知識可以關注丸趣 TV 網站，丸趣 TV 小編將為大家輸出更多高質量的實用文章！

正文完

發表至：數據庫

2023-07-15

轉載說明：除特殊說明外本站除技術相關以外文章皆由網絡搜集發布，轉載請注明出處。

數據庫中如何查看統計信息

phpmyadmin修改管理員密碼的方法

PostgreSQL執行查詢時獲取元組屬性值實現方法是什么

KVM MYSQL 5.7單機FOR OEL7.5安裝文檔的方法是什么

ROLLUP和CUBE的使用方法有哪些

久久精品人人爽,华人av在线,亚洲性视频网站,欧美专区一二三

Greenplum怎么創建表的分布鍵