ASR和TTS名词解释

2021-09-21 | 0 评论 | 0 浏览

ASR

T: model unit in E2E training. Typically it’s char in Chinese, char or BPE in English.
L: lexicon, the lexicon is very simple. What we need to do is just split a word into its modeling unit sequence.
For example, the word “我们” is split into two chars “我们”, and the word “APPLE” is split into five letters “A P P L E”. We can see there is no phonemes and there is no need to design pronunciation on purpose.
G: language model, namely compiling the n-gram to standard WFST representation.

uses the standard Viterbi beam search algorithm in decoding.

MOS（Mean Opinion Scores），专家级评测（主观）；1-5分，5分最好。

注：微软小冰公开宣传是4.3分，但有业内朋友认为，也不能据此就说其“绝对”比科大讯飞好，因为每次评审的专家人选都不一样。说白了，目前整个AI行业内，还是各家说自己好的节奏。