数据库中树形结构的存储
背景
如何在数据库中存储树形结构数据?
一般比较普遍的就是四种方法:(具体见 SQL Anti-patterns这本书)
Adjacency List | 每一条记录存parent_id |
Path Enumerations | 每一条记录存整个tree path经过的node枚举 |
Nested Sets | 每一条记录存 nleft 和 nright |
Closure Table | 维护一个表,所有的tree path作为记录进行保存 |
Path Enumerations在PostgreSQL中的LTREE扩展做了实现。
初体验
服务端部署
本文直接使用了阿里云RDS for PostgreSQL产品,自带LTREE扩展。
客户端
可以使用dbeaver,下载PostgreSQL对应的插件。
也可以直接使用命令行 psql。
连接数据库
psql -h 127.0.0.1 -p 5432 -d database -U postgres
查看版本
volume=> select version();
version
-----------------
PostgreSQL 11.9
安装插件
查看服务器上可用的插件
volume=> select name from pg_available_extensions where name = 'ltree';
name
-------
ltree
查看启用的插件
volume=> select * from pg_extension;
extname | extowner | extnamespace | extrelocatable | extversion | extconfig | extcondition
---------+----------+--------------+----------------+------------+-----------+--------------
plpgsql | 10 | 11 | f | 1.0 | |
启用ltree插件
volume=> CREATE EXTENSION IF NOT EXISTS ltree;
CREATE EXTENSION
确认插件启用成功
volume=> select * from pg_extension;
extname | extowner | extnamespace | extrelocatable | extversion | extconfig | extcondition
---------+----------+--------------+----------------+------------+-----------+--------------
plpgsql | 10 | 11 | f | 1.0 | |
ltree | 16390 | 2200 | t | 1.1 | |
(2 rows)
基本测试
构造数据
--创建表 字段名为path,类型指定为ltree
CREATE TABLE test (path ltree);
--插入数据
INSERT INTO test VALUES ('Top');
INSERT INTO test VALUES ('Top.Science');
INSERT INTO test VALUES ('Top.Science.Astronomy');
INSERT INTO test VALUES ('Top.Science.Astronomy.Astrophysics');
INSERT INTO test VALUES ('Top.Science.Astronomy.Cosmology');
INSERT INTO test VALUES ('Top.Hobbies');
INSERT INTO test VALUES ('Top.Hobbies.Amateurs_Astronomy');
INSERT INTO test VALUES ('Top.Collections');
INSERT INTO test VALUES ('Top.Collections.Pictures');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Stars');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Galaxies');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Astronauts');
--查询数据
select * from test;
---结果--
Top
Top.Science
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
Top.Hobbies
Top.Hobbies.Amateurs_Astronomy
Top.Collections
Top.Collections.Pictures
Top.Collections.Pictures.Astronomy
Top.Collections.Pictures.Astronomy.Stars
Top.Collections.Pictures.Astronomy.Galaxies
Top.Collections.Pictures.Astronomy.Astronauts
简单查询方法
--查找所有包含Astronomy的项
SELECT path FROM test WHERE path ~ '*.Astronomy.*';
path
-----------------------------------------------
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
Top.Collections.Pictures.Astronomy
Top.Collections.Pictures.Astronomy.Stars
Top.Collections.Pictures.Astronomy.Galaxies
Top.Collections.Pictures.Astronomy.Astronauts
(7 rows)
--返回指定元素的后代
SELECT path FROM test WHERE path <@ 'Top.Science';
path
------------------------------------
Top.Science
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
(4 rows)
--返回从第0个开始的后两个
SELECT subpath(path,0,2) FROM test;
--------结果-----
Top
Top.Science
Top.Science
Top.Science
Top.Science
Top.Hobbies
Top.Hobbies
Top.Collections
Top.Collections
Top.Collections
Top.Collections
Top.Collections
Top.Collections
--返回排重后的从第0个开始的后两个
SELECT distinct(subpath(path,0,2)) FROM test;
subpath
-----------------
Top
Top.Collections
Top.Hobbies
Top.Science
(4 rows)
高级技巧
批量导入数据
建表
create table speechdata_20210111 (type char(2), path ltree, size int8 not null, modified_time int8 not null);
清空表
TRUNCATE TABLE speechdata_20210111;
批量导入数据
#encoding=utf-8
import psycopg2
import io
if __name__=='__main__':
# 一次性插入数据
conn = psycopg2.connect(host=host, port=port, user=user, password=password, database=database)
cur = conn.cursor()
cur.copy_from(io.StringIO(s), table, columns=('type','path','ltreepath','size','modified_time'))
conn.commit()
cur.close()
conn.close()
print('done')
ltree插件只支持由A-Za-z0-9_组成字符串,此外.
是分隔符。因此需要对路径中的其它字符做转义。
汉字转为拼音
#encoding=utf-8
import psycopg2
import io
import pypinyin
import sys
def is_Chinese(word):
for ch in word:
if '\u4e00' <= ch <= '\u9fff':
return True
return False
def pinyin(word):
if not is_Chinese(word):
return word
else:
s = ''
for i in pypinyin.pinyin(word, style=pypinyin.NORMAL):
s += ''.join(i)
return s
过滤其它字符
def clean(item):
item = pinyin(item)
for ch in item:
if 'A' <= ch <= 'Z' or 'a' <= ch <= 'z' or '0' <= ch <= '9' or ch == '_' or ch == '/':
pass
else:
item = item.replace(ch,'_')
item = item.replace('/','.')
return item