数据库中树形结构的存储

  |   0 评论   |   0 浏览

背景

如何在数据库中存储树形结构数据?

一般比较普遍的就是四种方法:(具体见 SQL Anti-patterns这本书)

Adjacency List每一条记录存parent_id
Path Enumerations每一条记录存整个tree path经过的node枚举
Nested Sets每一条记录存 nleft 和 nright
Closure Table维护一个表,所有的tree path作为记录进行保存

Path Enumerations在PostgreSQL中的LTREE扩展做了实现。

初体验

服务端部署

本文直接使用了阿里云RDS for PostgreSQL产品,自带LTREE扩展。

客户端

可以使用dbeaver,下载PostgreSQL对应的插件。

也可以直接使用命令行 psql。

连接数据库

psql -h 127.0.0.1 -p 5432 -d database -U postgres

查看版本

volume=> select version();
     version
-----------------
 PostgreSQL 11.9

安装插件

查看服务器上可用的插件

volume=> select name from pg_available_extensions where name = 'ltree';
 name
-------
 ltree

查看启用的插件

volume=> select * from pg_extension;
 extname | extowner | extnamespace | extrelocatable | extversion | extconfig | extcondition
---------+----------+--------------+----------------+------------+-----------+--------------
 plpgsql |       10 |           11 | f              | 1.0        |           |

启用ltree插件

volume=> CREATE EXTENSION IF NOT EXISTS ltree;
CREATE EXTENSION

确认插件启用成功

volume=> select * from pg_extension;
 extname | extowner | extnamespace | extrelocatable | extversion | extconfig | extcondition
---------+----------+--------------+----------------+------------+-----------+--------------
 plpgsql |       10 |           11 | f              | 1.0        |           |
 ltree   |    16390 |         2200 | t              | 1.1        |           |
(2 rows)

基本测试

构造数据

--创建表 字段名为path,类型指定为ltree
CREATE TABLE test (path ltree);
--插入数据
INSERT INTO test VALUES ('Top');
INSERT INTO test VALUES ('Top.Science');
INSERT INTO test VALUES ('Top.Science.Astronomy');
INSERT INTO test VALUES ('Top.Science.Astronomy.Astrophysics');
INSERT INTO test VALUES ('Top.Science.Astronomy.Cosmology');
INSERT INTO test VALUES ('Top.Hobbies');
INSERT INTO test VALUES ('Top.Hobbies.Amateurs_Astronomy');
INSERT INTO test VALUES ('Top.Collections');
INSERT INTO test VALUES ('Top.Collections.Pictures');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Stars');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Galaxies');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Astronauts');
--查询数据
select * from test;
---结果--
Top
Top.Science
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
Top.Hobbies
Top.Hobbies.Amateurs_Astronomy
Top.Collections
Top.Collections.Pictures
Top.Collections.Pictures.Astronomy
Top.Collections.Pictures.Astronomy.Stars
Top.Collections.Pictures.Astronomy.Galaxies
Top.Collections.Pictures.Astronomy.Astronauts

简单查询方法

--查找所有包含Astronomy的项 
SELECT path FROM test WHERE path ~ '*.Astronomy.*';
          path                      
-----------------------------------------------
 Top.Science.Astronomy
 Top.Science.Astronomy.Astrophysics
 Top.Science.Astronomy.Cosmology
 Top.Collections.Pictures.Astronomy
 Top.Collections.Pictures.Astronomy.Stars
 Top.Collections.Pictures.Astronomy.Galaxies
 Top.Collections.Pictures.Astronomy.Astronauts
(7 rows)
--返回指定元素的后代
SELECT path FROM test WHERE path <@ 'Top.Science';
                path                
------------------------------------
 Top.Science
 Top.Science.Astronomy
 Top.Science.Astronomy.Astrophysics
 Top.Science.Astronomy.Cosmology
(4 rows)
--返回从第0个开始的后两个
SELECT subpath(path,0,2)  FROM test;
--------结果-----
Top
Top.Science
Top.Science
Top.Science
Top.Science
Top.Hobbies
Top.Hobbies
Top.Collections
Top.Collections
Top.Collections
Top.Collections
Top.Collections
Top.Collections
--返回排重后的从第0个开始的后两个
SELECT distinct(subpath(path,0,2))  FROM test;
     subpath
-----------------
 Top
 Top.Collections
 Top.Hobbies
 Top.Science
(4 rows)

高级技巧

批量导入数据

建表

create table speechdata_20210111 (type char(2), path ltree, size int8 not null, modified_time int8 not null);

清空表

TRUNCATE TABLE  speechdata_20210111;

批量导入数据

#encoding=utf-8
import psycopg2
import io
if __name__=='__main__':
    # 一次性插入数据
    conn = psycopg2.connect(host=host, port=port, user=user, password=password, database=database)
    cur = conn.cursor()
    cur.copy_from(io.StringIO(s), table, columns=('type','path','ltreepath','size','modified_time'))
    conn.commit()
    cur.close()
    conn.close()
    print('done')

ltree插件只支持由A-Za-z0-9_组成字符串,此外.是分隔符。因此需要对路径中的其它字符做转义。

汉字转为拼音

#encoding=utf-8
import psycopg2
import io
import pypinyin
import sys

def is_Chinese(word):
    for ch in word:
        if '\u4e00' <= ch <= '\u9fff':
            return True
    return False

def pinyin(word):
    if not is_Chinese(word):
        return word
    else:
        s = ''
        for i in pypinyin.pinyin(word, style=pypinyin.NORMAL):
            s += ''.join(i)
        return s

过滤其它字符

def clean(item):
    item = pinyin(item)
    for ch in item:
        if 'A' <= ch <= 'Z' or 'a' <= ch <= 'z' or '0' <= ch <= '9' or ch == '_' or ch == '/':
           pass
        else:
           item = item.replace(ch,'_')
    item = item.replace('/','.')
    return item

参考