wetts在NVIDIA Triton中的初体验
背景
我们目前在实现自己的TTS Server,通过Triton来进行部署。
demo
官方入门示例 (GPU跳过本节)
pip install git+https://github.com/wenet-e2e/wetts.git
测试
./.local/bin/wetts --text "今天天气怎么样" --wav output.wav
注意事项
其中,会下载两个资源文件:
- wenet/wetts_pretrained_models/master/baker_bert_onnx.tar.gz
- wenet/wetts_pretrained_models/master/multilingual_vits_v3_onnx.tar.gz
第一个是前端资源,第二个是多语音资源。
├── frontend
│ ├── baker_bert_onnx.tar.gz
│ ├── final.onnx
│ ├── frontend.flags
│ ├── g2p_en
│ │ ├── cmudict.dict
│ │ ├── model.fst
│ │ ├── phones.sym
│ │ └── README.md
│ ├── lexicon
│ │ ├── lexicon.txt
│ │ ├── pinyin_dict.txt
│ │ ├── polyphone.txt
│ │ └── prosody.txt
│ ├── tn
│ │ ├── zh_tn_tagger.fst
│ │ └── zh_tn_verbalizer.fst
│ └── vocab.txt
└── multilingual
├── config.json
├── final.onnx
├── multilingual_vits_v3_onnx.tar.gz
├── phones.txt
├── speaker.txt
└── vits.flags
初体验
环境准备
代码
git clone https://github.com/wenet-e2e/wetts.git
镜像
使用wetts示例中指定的镜像使用的是 nvcr.io/nvidia/tritonserver:22.09-py3
镜像。
依赖
系统依赖
apt-get install libfst-dev
python3 -m pip install --upgrade pip -i https://pypi.tuna.tsinghua.edu.cn/simple/
python库依赖
pip3 install torch==1.12.1+cu116 torchvision==0.13.1+cu116 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu116 -i https://pypi.tuna.tsinghua.edu.cn/simple/
# 使用上面这行指定版本的,不使用下面这行不指定版本的。
# pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116 -i https://pypi.tuna.tsinghua.edu.cn/simple/
pip3 install pypinyin scipy grpcio-tools tritonclient -i https://pypi.tuna.tsinghua.edu.cn/simple/
pip3 install pynini==2.1.4 WeTextProcessing -i https://pypi.tuna.tsinghua.edu.cn/simple/
注意,最新的pynini 2.1.5在安装时报错,本文指定了pynini 2.1.4版本。
转换成onnx模型(跳过)
Dataset | Language | Checkpoint Model | Runtime Model |
---|---|---|---|
Baker | CN | BERT | BERT |
Multilingual | CN | VITS | VITS |
本文以Baker模型为例,其内容如下:
baker_bert_exp
├── final.pt
├── lexicon
│ ├── lexicon.txt
│ ├── pinyin_dict.txt
│ ├── polyphone.txt
│ └── prosody.txt
└── vocab.txt
环境准备
python3 -m pip install librosa -i https://pypi.tuna.tsinghua.edu.cn/simple/
模型转换
模型使用 Multilingual 的 Checkpoint Model。
python3 vits/export_onnx.py
--checkpoint ~/tmp/multilingual_vits_v3_exp/final.pth
--cfg ~/tmp/multilingual_vits_v3_exp/config.json
--onnx_model ./generator.onnx
--phone_table ~/tmp/multilingual_vits_v3_exp/phones.txt
--speaker_table ~/tmp/multilingual_vits_v3_exp/speaker.txt
--providers CUDAExecutionProvider
这步报错了,报错为:
2023-12-11 18:14:08 INFO Loaded checkpoint '/home/service/tmp/multilingual_vits_v3_exp/final.pth' (iteration 1386)
Traceback (most recent call last):
File "vits/export_onnx.py", line 177, in <module>
main()
File "vits/export_onnx.py", line 81, in main
net_g.dec.remove_weight_norm()
File "/home/service/tmp/wetts/wetts/vits/model/decoders.py", line 86, in remove_weight_norm
remove_weight_norm(l)
File "/home/service/.local/lib/python3.8/site-packages/torch/nn/utils/weight_norm.py", line 153, in remove_weight_norm
raise ValueError(f"weight_norm of '{name}' not found in {module}")
ValueError: weight_norm of 'weight' not found in ParametrizedConvTranspose1d(
256, 128, kernel_size=(16,), stride=(8,), padding=(4,)
(parametrizations): ModuleDict(
(weight): ParametrizationList(
(0): _WeightNorm()
)
)
)
下载onnx模型
准备模型
使用 multilingual_vits_v3_onnx 中的 final.onnx 为 ./generator/1/generator.onnx
修改model_repo/generator/config.pbtxt文件
33 {
34 name: "scales"
35 data_type: TYPE_FP32
36 dims: [3]
37 },
38 {
39 name: "sid"
40 data_type: TYPE_INT64
41 dims: [1]
42 reshape: { shape: [ ] }
43 }
44 ]
修改model_repo/tts/1/model.py文件
修改如下:
59 def tokenize(self, text):
60 text = self.text_normalizer.normalize(text)
61 text = text.replace(' ', '') <-- 增加1行
62 pinyin_seq = lazy_pinyin(
127 for i, seq in enumerate(seqs):
128 input_ids[i][:len(seq)] = seq
129 input_lengths[i] = len(seq)
130 input_lengths = np.expand_dims(input_lengths, axis=1)
131
132 in_0 = pb_utils.Tensor("input", input_ids)
133 in_1 = pb_utils.Tensor("input_lengths", input_lengths)
134 in_2 = pb_utils.Tensor("scales", input_scales)
135
136 spk_id = np.ones(len(total_text), dtype=np.int64)
137 spk_id = np.expand_dims(spk_id, axis=1)
138 in_3 = pb_utils.Tensor("sid", spk_id) <-- 增加了in_3
139
140 inference_request = pb_utils.InferenceRequest(
141 model_name='generator',
142 requested_output_names=['output'],
143 inputs=[in_0, in_1, in_2, in_3])
144
145 inference_response = inference_request.exec()
继续修改mode.py文件
3
4 from torch.utils.dlpack import to_dlpack
5 from torch.utils.dlpack import from_dlpack
6
13 """
14
15 def pb_tensor_to_numpy(self, pb_tensor):
16 if pb_tensor.is_cpu():
17 return pb_tensor.as_numpy()
18 else:
19 pytorch_tensor = from_dlpack(pb_tensor.to_dlpack())
20 return pytorch_tensor.cpu().numpy()
21
22 def initialize(self, args):
159 audios = pb_utils.get_output_tensor_by_name(inference_response,
160 'output')
161 audios = self.pb_tensor_to_numpy(audios) <-- 加了一行
162 # audios = audios.as_numpy() <-- 这行去掉
修改配置文件
文件model_repo/tts/config.pbtxt
,修改如下:
{
key: "token_dict"
value: { string_value: "/root/tmp/multilingual_vits_v3_onnx/phones.txt"}
},
{
key: "pinyin_lexicon"
value: { string_value: "/root/tmp/models/baker_bert_onnx/lexicon/lexicon.txt"}
}
其中 token_dict
来自multilingual_vits_v3_onnx
中,lexicon.txt
来自baker_bert_onnx
中。
启动
这里用了root权限,否则会报 "Failed to give execute permission to triton_python_backend_stub"。
# CUDA_VISIBLE_DEVICES="0" tritonserver --model-repository model_repo
客户端推理
需要包
pip install packaging
$ python3 client.py --text text.scp --outdir test_audios
结果:生成了两个wav文件
MD5 (wav1.wav) = b442c71c5ce7261566f2c876b27d1515
MD5 (wav2.wav) = db416b60f6ebd7abbbf92812227789bf
其它
访问方式
triton不同的端口,支持不同的访问方式。上面的client.py是通过8001端口访问的grpc service。
I1214 06:15:00.927244 21191 grpc_server.cc:4820] Started GRPCInferenceService at 0.0.0.0:8001
I1214 06:15:00.927582 21191 http_server.cc:3474] Started HTTPService at 0.0.0.0:8000
I1214 06:15:00.968717 21191 http_server.cc:181] Started Metrics Service at 0.0.0.0:8002
附录
版本对照表
来自[3]
NVIDIA版本问题
错误
The NVIDIA driver on your system is too old (found version 11080)
原因
NVIDIA驱动版本和pytorch版本不一致。
CUDA版本为11.8。
$ nvidia-smi
Tue Dec 12 21:20:41 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.8 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:00:0A.0 Off | 0 |
| N/A 37C P0 26W / 70W | 1079MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
pytorch版本为:
>>> import torch
>>> print(torch.__version__)
2.1.1+cu121
降级pytorch到1.12.1
pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu116
多线程问题
报错
Traceback (most recent call last):
File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/usr/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.8/multiprocessing/pool.py", line 576, in _handle_results
task = get()
File "/usr/lib/python3.8/multiprocessing/connection.py", line 251, in recv
return _ForkingPickler.loads(buf.getbuffer())
TypeError: __init__() missing 1 required positional argument: 'msg'
这里我去掉了多线程调用的逻辑,改为了:
for idx, per_split in enumerate(splits):
cur_files = per_split.tolist()
tasks.append((idx, cur_files))
# with Pool(processes=num_workers) as pool:
# predictions = pool.map(single_job, tasks)
os.makedirs(FLAGS.outdir, exist_ok=True)
for task in tasks:
predictions = single_job(task)
# predictions = [item for sub_pred in predictions for item in sub_pred]
for audio_name, result in predictions:
assert len(result.shape) == 1, result.shape
wavfile.write(FLAGS.outdir + "/" + audio_name + ".wav",
FLAGS.sampling_rate, result.astype(np.int16))
参考