ASR和TTS名词解释

  |   0 评论   |   0 浏览

ASR

CTC WFST based search

来自CTC WFST based search

  • T: model unit in E2E training. Typically it’s char in Chinese, char or BPE in English.
  • L: lexicon, the lexicon is very simple. What we need to do is just split a word into its modeling unit sequence.
    For example, the word “我们” is split into two chars “我 们”, and the word “APPLE” is split into five letters “A P P L E”. We can see there is no phonemes and there is no need to design pronunciation on purpose.
  • G: language model, namely compiling the n-gram to standard WFST representation.

decoder

uses the standard Viterbi beam search algorithm in decoding.

TTS

MOS分

MOS(Mean Opinion Scores),专家级评测(主观);1-5分,5分最好。

注:微软小冰公开宣传是4.3分,但有业内朋友认为,也不能据此就说其“绝对”比科大讯飞好,因为每次评审的专家人选都不一样。说白了,目前整个AI行业内,还是各家说自己好的节奏。