wetts在NVIDIA Triton中的初体验

  |   0 评论   |   0 浏览

背景

我们目前在实现自己的TTS Server,通过Triton来进行部署。

demo

官方入门示例 (GPU跳过本节)

pip install git+https://github.com/wenet-e2e/wetts.git

测试

./.local/bin/wetts --text "今天天气怎么样" --wav output.wav

注意事项

其中,会下载两个资源文件:

  • wenet/wetts_pretrained_models/master/baker_bert_onnx.tar.gz
  • wenet/wetts_pretrained_models/master/multilingual_vits_v3_onnx.tar.gz

第一个是前端资源,第二个是多语音资源。

├── frontend
│   ├── baker_bert_onnx.tar.gz
│   ├── final.onnx
│   ├── frontend.flags
│   ├── g2p_en
│   │   ├── cmudict.dict
│   │   ├── model.fst
│   │   ├── phones.sym
│   │   └── README.md
│   ├── lexicon
│   │   ├── lexicon.txt
│   │   ├── pinyin_dict.txt
│   │   ├── polyphone.txt
│   │   └── prosody.txt
│   ├── tn
│   │   ├── zh_tn_tagger.fst
│   │   └── zh_tn_verbalizer.fst
│   └── vocab.txt
└── multilingual
    ├── config.json
    ├── final.onnx
    ├── multilingual_vits_v3_onnx.tar.gz
    ├── phones.txt
    ├── speaker.txt
    └── vits.flags

初体验

环境准备

代码

git clone https://github.com/wenet-e2e/wetts.git

镜像

使用wetts示例中指定的镜像使用的是 nvcr.io/nvidia/tritonserver:22.09-py3镜像。

依赖

系统依赖

apt-get install libfst-dev
python3 -m pip install --upgrade pip -i https://pypi.tuna.tsinghua.edu.cn/simple/

python库依赖

pip3 install torch==1.12.1+cu116 torchvision==0.13.1+cu116 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu116 -i https://pypi.tuna.tsinghua.edu.cn/simple/

# 使用上面这行指定版本的,不使用下面这行不指定版本的。
# pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116 -i https://pypi.tuna.tsinghua.edu.cn/simple/

pip3 install pypinyin scipy grpcio-tools tritonclient -i https://pypi.tuna.tsinghua.edu.cn/simple/

pip3 install pynini==2.1.4 WeTextProcessing -i https://pypi.tuna.tsinghua.edu.cn/simple/

注意,最新的pynini 2.1.5在安装时报错,本文指定了pynini 2.1.4版本。

转换成onnx模型(跳过)

DatasetLanguageCheckpoint ModelRuntime Model
BakerCNBERTBERT
MultilingualCNVITSVITS

本文以Baker模型为例,其内容如下:

baker_bert_exp
├── final.pt
├── lexicon
│   ├── lexicon.txt
│   ├── pinyin_dict.txt
│   ├── polyphone.txt
│   └── prosody.txt
└── vocab.txt

环境准备

python3 -m pip install librosa -i https://pypi.tuna.tsinghua.edu.cn/simple/

模型转换

模型使用 Multilingual 的 Checkpoint Model。

python3 vits/export_onnx.py 
--checkpoint ~/tmp/multilingual_vits_v3_exp/final.pth 
--cfg ~/tmp/multilingual_vits_v3_exp/config.json 
--onnx_model ./generator.onnx 
--phone_table ~/tmp/multilingual_vits_v3_exp/phones.txt 
--speaker_table ~/tmp/multilingual_vits_v3_exp/speaker.txt
--providers CUDAExecutionProvider

这步报错了,报错为:

2023-12-11 18:14:08 INFO     Loaded checkpoint '/home/service/tmp/multilingual_vits_v3_exp/final.pth' (iteration 1386)
Traceback (most recent call last):
  File "vits/export_onnx.py", line 177, in <module>
    main()
  File "vits/export_onnx.py", line 81, in main
    net_g.dec.remove_weight_norm()
  File "/home/service/tmp/wetts/wetts/vits/model/decoders.py", line 86, in remove_weight_norm
    remove_weight_norm(l)
  File "/home/service/.local/lib/python3.8/site-packages/torch/nn/utils/weight_norm.py", line 153, in remove_weight_norm
    raise ValueError(f"weight_norm of '{name}' not found in {module}")
ValueError: weight_norm of 'weight' not found in ParametrizedConvTranspose1d(
  256, 128, kernel_size=(16,), stride=(8,), padding=(4,)
  (parametrizations): ModuleDict(
    (weight): ParametrizationList(
      (0): _WeightNorm()
    )
  )
)

下载onnx模型

准备模型

使用 multilingual_vits_v3_onnx 中的 final.onnx 为 ./generator/1/generator.onnx

修改model_repo/generator/config.pbtxt文件

33   {
34     name: "scales"
35     data_type: TYPE_FP32
36     dims: [3]
37   },
38   {
39     name: "sid"
40     data_type: TYPE_INT64
41     dims: [1]
42     reshape: { shape: [ ] }
43   }
44 ]

修改model_repo/tts/1/model.py文件

修改如下:

59     def tokenize(self, text):
 60         text = self.text_normalizer.normalize(text)
 61         text = text.replace(' ', '') <-- 增加1行
 62         pinyin_seq = lazy_pinyin(


127         for i, seq in enumerate(seqs):
128             input_ids[i][:len(seq)] = seq
129             input_lengths[i] = len(seq)
130         input_lengths = np.expand_dims(input_lengths, axis=1)
131
132         in_0 = pb_utils.Tensor("input", input_ids)
133         in_1 = pb_utils.Tensor("input_lengths", input_lengths)
134         in_2 = pb_utils.Tensor("scales", input_scales)
135
136         spk_id = np.ones(len(total_text), dtype=np.int64) 
137         spk_id = np.expand_dims(spk_id, axis=1)
138         in_3 = pb_utils.Tensor("sid", spk_id) <--  增加了in_3
139
140         inference_request = pb_utils.InferenceRequest(
141             model_name='generator',
142             requested_output_names=['output'],
143             inputs=[in_0, in_1, in_2, in_3])
144
145         inference_response = inference_request.exec()

继续修改mode.py文件

3
  4 from torch.utils.dlpack import to_dlpack
  5 from torch.utils.dlpack import from_dlpack
  6

13     """
 14
 15     def pb_tensor_to_numpy(self, pb_tensor):
 16         if pb_tensor.is_cpu():
 17             return pb_tensor.as_numpy()
 18         else:
 19             pytorch_tensor = from_dlpack(pb_tensor.to_dlpack())
 20             return pytorch_tensor.cpu().numpy()
 21
 22     def initialize(self, args):


159             audios = pb_utils.get_output_tensor_by_name(inference_response,
160                                                         'output')
161             audios = self.pb_tensor_to_numpy(audios) <-- 加了一行
162             # audios = audios.as_numpy() <-- 这行去掉

修改配置文件

文件model_repo/tts/config.pbtxt,修改如下:

{
    key: "token_dict"
    value: { string_value: "/root/tmp/multilingual_vits_v3_onnx/phones.txt"}
  },
  {
    key: "pinyin_lexicon"
    value: { string_value: "/root/tmp/models/baker_bert_onnx/lexicon/lexicon.txt"}
  }

其中 token_dict来自multilingual_vits_v3_onnx中,lexicon.txt来自baker_bert_onnx中。

启动

这里用了root权限,否则会报 "Failed to give execute permission to triton_python_backend_stub"。

# CUDA_VISIBLE_DEVICES="0" tritonserver --model-repository model_repo

客户端推理

需要包

pip install packaging
$ python3 client.py --text text.scp --outdir test_audios

结果:生成了两个wav文件

MD5 (wav1.wav) = b442c71c5ce7261566f2c876b27d1515
MD5 (wav2.wav) = db416b60f6ebd7abbbf92812227789bf

其它

访问方式

triton不同的端口,支持不同的访问方式。上面的client.py是通过8001端口访问的grpc service。

I1214 06:15:00.927244 21191 grpc_server.cc:4820] Started GRPCInferenceService at 0.0.0.0:8001
I1214 06:15:00.927582 21191 http_server.cc:3474] Started HTTPService at 0.0.0.0:8000
I1214 06:15:00.968717 21191 http_server.cc:181] Started Metrics Service at 0.0.0.0:8002

附录

版本对照表

来自[3]

image.png

image.png

NVIDIA版本问题

错误

The NVIDIA driver on your system is too old (found version 11080)

原因

NVIDIA驱动版本和pytorch版本不一致。

CUDA版本为11.8。

$ nvidia-smi
Tue Dec 12 21:20:41 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:0A.0 Off |                    0 |
| N/A   37C    P0    26W /  70W |   1079MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

pytorch版本为:

>>> import torch
>>> print(torch.__version__)
2.1.1+cu121

降级pytorch到1.12.1

pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu116

多线程问题

报错

Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 576, in _handle_results
    task = get()
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 251, in recv
    return _ForkingPickler.loads(buf.getbuffer())
TypeError: __init__() missing 1 required positional argument: 'msg'

这里我去掉了多线程调用的逻辑,改为了:

for idx, per_split in enumerate(splits):
        cur_files = per_split.tolist()
        tasks.append((idx, cur_files))

#    with Pool(processes=num_workers) as pool:
#        predictions = pool.map(single_job, tasks)

    os.makedirs(FLAGS.outdir, exist_ok=True)

    for task in tasks:
        predictions = single_job(task)

#    predictions = [item for sub_pred in predictions for item in sub_pred]


    for audio_name, result in predictions:
        assert len(result.shape) == 1, result.shape
        wavfile.write(FLAGS.outdir + "/" + audio_name + ".wav",
                      FLAGS.sampling_rate, result.astype(np.int16))

参考

  1. WeTTS
  2. WeTTS TTS Triton Server
  3. NVIDIA Triton Inference Server Container Versions