PostgreSQL本地化設(shè)置對SQL特性的影響有哪些

158次閱讀

共計 4030 個字符，預(yù)計需要花費 11 分鐘才能閱讀完成。

這篇文章主要介紹“PostgreSQL 本地化設(shè)置對 SQL 特性的影響有哪些”，在日常操作中，相信很多人在 PostgreSQL 本地化設(shè)置對 SQL 特性的影響有哪些問題上存在疑惑，丸趣 TV 小編查閱了各式資料，整理出簡單好用的操作方法，希望對大家解答”PostgreSQL 本地化設(shè)置對 SQL 特性的影響有哪些”的疑惑有所幫助！接下來，請跟著丸趣 TV 小編一起來學(xué)習(xí)吧！

PostgreSQL 在使用 initdb 初始化數(shù)據(jù)庫時, 提供了”本地化”的參數(shù) locale, 如不指定該參數(shù)則默認(rèn)為空, 即使用 OS 的 locale 設(shè)定.
本地化設(shè)置對以下 SQL 特性有影響:
1. 排序和比較操作 : Sort order in queries using ORDER BY or the standard comparison operators on textual data
2. 內(nèi)置函數(shù) : The upper, lower, and initcap functions
3. 模式匹配 : Pattern matching operators (LIKE, SIMILAR TO, and POSIX-style regular expressions); locales affect both case insensitive matching and the classification of characters by character-class regular expressions
4.to_char 相關(guān)函數(shù) : The to_char family of functions
5.LIKE 能否使用索引 : The ability to use indexes with LIKE clauses

排序
同樣的數(shù)據(jù), 使用不同的 LC_COLLATE,SQL 輸出不同:

postgres=# SELECT name FROM unnest(ARRAY[ MYNAME ,   my_name ,  my-image.jpg ,  my-third-image.jpg]) name ORDER BY name collate  C 
 name 
--------------------
 my_name
 MYNAME
 my-image.jpg
 my-third-image.jpg
(4 rows)
postgres=# SELECT name FROM unnest(ARRAY[ MYNAME ,   my_name ,  my-image.jpg ,  my-third-image.jpg]) name ORDER BY name collate  zh_CN 
 name 
--------------------
 my-image.jpg
 my_name
 MYNAME
 my-third-image.jpg
(4 rows)

collate 指定為”C”, 則使用默認(rèn)的字符串的二進(jìn)制 ASCII 碼值進(jìn)行對比, 而指定是 zh_CN 則不是.

使用 zh_CN 其行為按不區(qū)分大小寫進(jìn)行處理

postgres=# SELECT name FROM unnest(ARRAY[ MYNAME1 ,   my_name2 ,  my-image.jpg ,  my-third-image.jpg]) name ORDER BY name collate  zh_CN 
 name 
--------------------
 my-image.jpg
 MYNAME1
 my_name2
 my-third-image.jpg
(4 rows)
postgres=# SELECT name FROM unnest(ARRAY[ myname1 ,   myname2 ,  myimage.jpg ,  mythirdimage.jpg]) name ORDER BY name collate  zh_CN 
 name 
------------------
 myimage.jpg
 myname1
 myname2
 mythirdimage.jpg
(4 rows)

郵件列表中的解釋如下:

The behavior of each collation comes from the operating system’s own
libc, except for the C collation, which is based on the ordering
implied by strcmp() comparisons. Generally, most implementations have
the behavior you describe, in that they assign least weight of all to
caseness and whitespace, and somewhat more weight to punctuation. I
don’t think that there is much that can be done about it in practice,
though in principal there could be a collation that has all the
properties you want.

內(nèi)置函數(shù)
如 initcap, 在法語和 C 下面會有不同

postgres=# select initcap( élysée  collate  C 
 initcap 
---------
 éLyséE
(1 row)
postgres=# select initcap( élysée  collate  fr_FR 
 initcap 
---------
 élysée
(1 row)

在中文語境下, 全角字符的小寫字母會轉(zhuǎn)換為全角的大寫字母

postgres=# select initcap( a  collate  zh_CN 
 initcap 
---------
 A
(1 row)
postgres=# select initcap( a  collate  C 
 initcap 
---------
 a
(1 row)

在 LC_COLLATE 下, 只會對 7F 以下的 ASCII 字符生效, 其他字符不生效

模式匹配

postgres=# select  élysée  ~  ^\w+$  collate  fr_FR 
 ?column? 
----------
 t
(1 row)
postgres=# select  élysée  COLLATE  C  ~  ^\w+$ 
 ?column? 
----------
 f
(1 row)

LIKE 能否使用索引

postgres=# CREATE TABLE t_sort (
postgres(# a text COLLATE  zh_CN ,
postgres(# b text COLLATE  C 
CREATE TABLE
postgres=# 
postgres=# INSERT INTO t_sort SELECT md5(n::text), md5(n::text)
postgres-# FROM generate_series(1, 1000000) n; 
INSERT 0 1000000
postgres=# CREATE INDEX ON t_sort USING btree (a);
CREATE INDEX
postgres=# CREATE INDEX ON t_sort USING btree (b);
CREATE INDEX
postgres=# ANALYZE t_sort;
ANALYZE
postgres=# SELECT * FROM t_sort LIMIT 2;
 a | b 
----------------------------------+----------------------------------
 c4ca4238a0b923820dcc509a6f75849b | c4ca4238a0b923820dcc509a6f75849b
 c81e728d9d4c2f636f067f89cc14862c | c81e728d9d4c2f636f067f89cc14862c
(2 rows)
postgres=# explain SELECT * FROM t_sort WHERE a LIKE  c4ca4238a0% 
 QUERY PLAN 
---------------------------------------------------------------------------
 Gather (cost=1000.00..18564.33 rows=100 width=66)
 Workers Planned: 2
 -  Parallel Seq Scan on t_sort (cost=0.00..17554.33 rows=42 width=66)
 Filter: (a ~~  c4ca4238a0% ::text)
(4 rows)
postgres=# explain SELECT * FROM t_sort WHERE b LIKE  c4ca4238a0% 
 QUERY PLAN 
------------------------------------------------------------------------------
 Index Scan using t_sort_b_idx on t_sort (cost=0.42..8.45 rows=100 width=66)
 Index Cond: ((b  =  c4ca4238a0 ::text) AND (b    c4ca4238a1 ::text))
 Filter: (b ~~  c4ca4238a0% ::text)
(3 rows)

使用 zh_CN 不能用上索引, 但使用 C 可以用上索引

到此，關(guān)于“PostgreSQL 本地化設(shè)置對 SQL 特性的影響有哪些”的學(xué)習(xí)就結(jié)束了，希望能夠解決大家的疑惑。理論與實踐的搭配能更好的幫助大家學(xué)習(xí)，快去試試吧！若想繼續(xù)學(xué)習(xí)更多相關(guān)知識，請繼續(xù)關(guān)注丸趣 TV 網(wǎng)站，丸趣 TV 小編會繼續(xù)努力為大家?guī)砀鄬嵱玫奈恼拢?/p>

正文完