共計 7672 個字符,預計需要花費 20 分鐘才能閱讀完成。
MySQL 分區表和 HBase 的關系是什么,針對這個問題,這篇文章詳細介紹了相對應的分析和解答,希望可以幫助更多想解決這個問題的小伙伴找到更簡單易行的方法。
創建 MySQL 分區數據
DROP TABLE ord_order;
— 創建訂單分區表
CREATE TABLE ord_order(
order_id BIGINT NOT NULL AUTO_INCREMENT COMMENT 訂單 ID ,
user_id INT NOT NULL COMMENT 用戶 ID ,
goods_id INT NOT NULL COMMENT 商品 ID ,
order_price INT NOT NULL DEFAULT 0 COMMENT 訂單實際價格 (分) ,
create_time DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 創建時間 ,
PRIMARY KEY(order_id, create_time)
)
PARTITION BY LIST (YEAR(create_time)*100 + MONTH(create_time))
(
PARTITION p201601 VALUES IN (201601),
PARTITION p201602 VALUES IN (201602),
PARTITION p201603 VALUES IN (201603),
PARTITION p201604 VALUES IN (201604),
PARTITION p201605 VALUES IN (201605),
PARTITION p201606 VALUES IN (201606),
PARTITION p201607 VALUES IN (201607),
PARTITION p201608 VALUES IN (201608),
PARTITION p201609 VALUES IN (201609),
PARTITION p201610 VALUES IN (201610),
PARTITION p201611 VALUES IN (201611),
PARTITION p201612 VALUES IN (201612)
);
— 插入相關數據
INSERT INTO ord_order VALUES
(NULL, 10000001, 11111111, 1000, 2016-01-13 01:00:10),
(NULL, 10000001, 11111112, 2000, 2016-01-13 02:00:20),
(NULL, 10000001, 11111113, 3000, 2016-01-13 03:00:30),
(NULL, 10000001, 11111114, 4000, 2016-01-13 04:00:40),
(NULL, 10000001, 11111115, 5000, 2016-01-13 05:00:50),
(NULL, 10000001, 11111111, 1000, 2016-02-13 01:00:10),
(NULL, 10000001, 11111112, 2000, 2016-02-13 02:00:20),
(NULL, 10000001, 11111113, 3000, 2016-02-13 03:00:30),
(NULL, 10000001, 11111114, 4000, 2016-02-13 04:00:40),
(NULL, 10000001, 11111115, 5000, 2016-02-13 05:00:50),
(NULL, 10000001, 11111111, 1000, 2016-03-13 01:00:10),
(NULL, 10000001, 11111112, 2000, 2016-03-13 02:00:20),
(NULL, 10000001, 11111113, 3000, 2016-03-13 03:00:30),
(NULL, 10000001, 11111114, 4000, 2016-03-13 04:00:40),
(NULL, 10000001, 11111115, 5000, 2016-03-13 05:00:50),
(NULL, 10000001, 11111111, 1000, 2016-04-13 01:00:10),
(NULL, 10000001, 11111112, 2000, 2016-04-13 02:00:20),
(NULL, 10000001, 11111113, 3000, 2016-04-13 03:00:30),
(NULL, 10000001, 11111114, 4000, 2016-04-13 04:00:40),
(NULL, 10000001, 11111115, 5000, 2016-04-13 05:00:50),
(NULL, 10000001, 11111111, 1000, 2016-05-13 01:00:10),
(NULL, 10000001, 11111112, 2000, 2016-05-13 02:00:20),
(NULL, 10000001, 11111113, 3000, 2016-05-13 03:00:30),
(NULL, 10000001, 11111114, 4000, 2016-05-13 04:00:40),
(NULL, 10000001, 11111115, 5000, 2016-05-13 05:00:50),
(NULL, 10000001, 11111111, 1000, 2016-06-13 01:00:10),
(NULL, 10000001, 11111112, 2000, 2016-06-13 02:00:20),
(NULL, 10000001, 11111113, 3000, 2016-06-13 03:00:30),
(NULL, 10000001, 11111114, 4000, 2016-06-13 04:00:40),
(NULL, 10000001, 11111115, 5000, 2016-06-13 05:00:50),
(NULL, 10000001, 11111111, 1000, 2016-07-13 01:00:10),
(NULL, 10000001, 11111112, 2000, 2016-07-13 02:00:20),
(NULL, 10000001, 11111113, 3000, 2016-07-13 03:00:30),
(NULL, 10000001, 11111114, 4000, 2016-07-13 04:00:40),
(NULL, 10000001, 11111115, 5000, 2016-07-13 05:00:50),
(NULL, 10000001, 11111111, 1000, 2016-08-13 01:00:10),
(NULL, 10000001, 11111112, 2000, 2016-08-13 02:00:20),
(NULL, 10000001, 11111113, 3000, 2016-08-13 03:00:30),
(NULL, 10000001, 11111114, 4000, 2016-08-13 04:00:40),
(NULL, 10000001, 11111115, 5000, 2016-08-13 05:00:50),
(NULL, 10000001, 11111111, 1000, 2016-09-13 01:00:10),
(NULL, 10000001, 11111112, 2000, 2016-09-13 02:00:20),
(NULL, 10000001, 11111113, 3000, 2016-09-13 03:00:30),
(NULL, 10000001, 11111114, 4000, 2016-09-13 04:00:40),
(NULL, 10000001, 11111115, 5000, 2016-09-13 05:00:50),
(NULL, 10000001, 11111111, 1000, 2016-10-13 01:00:10),
(NULL, 10000001, 11111112, 2000, 2016-10-13 02:00:20),
(NULL, 10000001, 11111113, 3000, 2016-10-13 03:00:30),
(NULL, 10000001, 11111114, 4000, 2016-10-13 04:00:40),
(NULL, 10000001, 11111115, 5000, 2016-10-13 05:00:50),
(NULL, 10000001, 11111111, 1000, 2016-11-13 01:00:10),
(NULL, 10000001, 11111112, 2000, 2016-11-13 02:00:20),
(NULL, 10000001, 11111113, 3000, 2016-11-13 03:00:30),
(NULL, 10000001, 11111114, 4000, 2016-11-13 04:00:40),
(NULL, 10000001, 11111115, 5000, 2016-11-13 05:00:50),
(NULL, 10000001, 11111111, 1000, 2016-12-13 01:00:10),
(NULL, 10000001, 11111112, 2000, 2016-12-13 02:00:20),
(NULL, 10000001, 11111113, 3000, 2016-12-13 03:00:30),
(NULL, 10000001, 11111114, 4000, 2016-12-13 04:00:40),
(NULL, 10000001, 11111115, 5000, 2016-12-13 05:00:50
— 查看分區 p201601 數據
SELECT * FROM ord_order PARTITION(p201601);
— 組合成的 row key
SELECT CONCAT(user_id, 10000000000-UNIX_TIMESTAMP(create_time), goods_id)
FROM ord_order PARTITION(p201601);
結合 HBase 咯
創建 HBase 表 ord_order
由于版本兼容的問題,這邊我需要先創建好 HBase 對應的表。不然會報不能自動創建 column family 的錯誤。
使用 hbase shell 創建 ord_order 表
hbase(main):033:0 create ord_order , {NAME = cf1}
使用 Sqoop 將 MySQL 的 ord_order 表的 p201601 分區的數據導入 HBase 表。
/usr/local/sqoop/bin/sqoop import \
–connect jdbc:mysql://192.168.137.11:3306/test?\
–username HH \
–password oracle \
–query SELECT CONCAT(user_id, 10000000000-UNIX_TIMESTAMP(create_time), goods_id) AS order_id, order_price, create_time FROM ord_order PARTITION(p201601) WHERE $CONDITIONS \
–hbase-table ord_order \
–hbase-create-table \
–hbase-row-key order_id \
–split-by order_id \
–column-family cf1 \
-m 1
導入成功后就可以在 MySQL 上面將相關分區刪除,并且創建之后需要的分區:
ALTER TABLE ord_order
ADD PARTITION (PARTITION p201701 VALUES IN (201701));
ALTER TABLE ord_order DROP PARTITION p201601;
查看 Hbase 中導入的數據
hbase(main):001:0 scan ord_order
ROW COLUMN+CELL
10000001854736755011111115 column=cf1:create_time, timestamp=1479224942888, value=2016-01-13 05:00:50.0
10000001854736755011111115 column=cf1:order_price, timestamp=1479224942888, value=5000
10000001854737116011111114 column=cf1:create_time, timestamp=1479224942888, value=2016-01-13 04:00:40.0
10000001854737116011111114 column=cf1:order_price, timestamp=1479224942888, value=4000
10000001854737477011111113 column=cf1:create_time, timestamp=1479224942888, value=2016-01-13 03:00:30.0
10000001854737477011111113 column=cf1:order_price, timestamp=1479224942888, value=3000
10000001854737838011111112 column=cf1:create_time, timestamp=1479224942888, value=2016-01-13 02:00:20.0
10000001854737838011111112 column=cf1:order_price, timestamp=1479224942888, value=2000
10000001854738199011111111 column=cf1:create_time, timestamp=1479224942888, value=2016-01-13 01:00:10.0
10000001854738199011111111 column=cf1:order_price, timestamp=1479224942888, value=1000
5 row(s) in 0.5390 seconds
ROW KEY 設計詳解
HBase 中的 row key 為 user_id, 10000000000-UNIX_TIMESTAMP(create_time), goods_id 3 個字段組成。
這邊值得注意的是 10000000000-UNIX_TIMESTAMP(create_time), 這樣設計的原因是為了讓訂單能按時間的倒序排列, 這樣就符合 越新的數據越先顯示
如: 現在需要對用戶 10000001 的訂單進行分頁, 每頁兩條數據, 并且按時間的倒序排序 (最新訂單最先顯示)
hbase(main):003:0 scan ord_order , {COLUMNS= [ cf1:order_price], ROWPREFIXFILTER= 10000001 , LIMIT= 2}
ROW COLUMN+CELL
10000001854736755011111115 column=cf1:order_price, timestamp=1479224942888, value=5000
10000001854737116011111114 column=cf1:order_price, timestamp=1479224942888, value=4000
點擊下一頁的數據:
hbase(main):004:0 scan ord_order , {COLUMNS= [ cf1:order_price], LIMIT= 3, STARTROW= 10000001854737116011111114 }
ROW COLUMN+CELL
10000001854737116011111114 column=cf1:order_price, timestamp=1479224942888, value=4000
10000001854737477011111113 column=cf1:order_price, timestamp=1479224942888, value=3000
10000001854737838011111112 column=cf1:order_price, timestamp=1479224942888, value=2000
3 row(s) in 0.0260 seconds
上面獲得了三行數據,在實際展現的時候去除第一行就好了,實際展示如下:
10000001854737477011111113 column=cf1:order_price, timestamp=1479224942888, value=3000
10000001854737838011111112 column=cf1:order_price, timestamp=1479224942888, value=2000
點擊上一頁
hbase(main):008:0 scan ord_order , {COLUMNS= [ cf1:order_price], LIMIT= 3, STARTROW= 10000001854737477011111113 , REVERSED= true}
ROW COLUMN+CELL
10000001854737477011111113 column=cf1:order_price, timestamp=1479224942888, value=3000
10000001854737116011111114 column=cf1:order_price, timestamp=1479224942888, value=4000
10000001854736755011111115 column=cf1:order_price, timestamp=1479224942888, value=5000
3 row(s) in 0.0640 seconds
上面同樣獲得了三條數據,我們需要去除第一行,讓后按數據集合倒序顯示
10000001854737116011111114 column=cf1:order_price, timestamp=1479224942888, value=4000
10000001854736755011111115 column=cf1:order_price, timestamp=1479224942888, value=5000
↓↓↓↓↓ 上面兩行是集合數據 下面兩行數倒序遍歷集合的數據 (也是最終顯示的數據)
10000001854736755011111115 column=cf1:order_price, timestamp=1479224942888, value=5000
10000001854737116011111114 column=cf1:order_price, timestamp=1479224942888, value=4000
關于 MySQL 分區表和 HBase 的關系是什么問題的解答就分享到這里了,希望以上內容可以對大家有一定的幫助,如果你還有很多疑惑沒有解開,可以關注丸趣 TV 行業資訊頻道了解更多相關知識。