CTC WFST based search
- T: model unit in E2E training. Typically it’s char in Chinese, char or BPE in English.
- L: lexicon, the lexicon is very simple. What we need to do is just split a word into its modeling unit sequence.
For example, the word “我们” is split into two chars “我 们”, and the word “APPLE” is split into five letters “A P P L E”. We can see there is no phonemes and there is no need to design pronunciation on purpose.
- G: language model, namely compiling the n-gram to standard WFST representation.
uses the standard Viterbi beam search algorithm in decoding.
MOS（Mean Opinion Scores），专家级评测（主观）；1-5分，5分最好。