ASR和TTS名词解释
ASR
CTC WFST based search
- T: model unit in E2E training. Typically it’s char in Chinese, char or BPE in English.
- L: lexicon, the lexicon is very simple. What we need to do is just split a word into its modeling unit sequence.
For example, the word “我们” is split into two chars “我 们”, and the word “APPLE” is split into five letters “A P P L E”. We can see there is no phonemes and there is no need to design pronunciation on purpose. - G: language model, namely compiling the n-gram to standard WFST representation.
decoder
uses the standard Viterbi beam search algorithm in decoding.
TTS
MOS分
MOS(Mean Opinion Scores),专家级评测(主观);1-5分,5分最好。
注:微软小冰公开宣传是4.3分,但有业内朋友认为,也不能据此就说其“绝对”比科大讯飞好,因为每次评审的专家人选都不一样。说白了,目前整个AI行业内,还是各家说自己好的节奏。