핵심 개념
Sequence discriminative training, such as maximum mutual information (MMI) and minimum Bayes risk (MBR) training, has a strong correlation with internal language model (ILM) subtraction for improving the performance of neural transducers.
초록
The paper investigates the relationship between ILM subtraction and sequence discriminative training for neural transducers.
Theoretically, the authors derive that the global optimum of MMI training shares a similar formula as ILM subtraction during decoding. Empirically, they show that sequence discriminative training and ILM subtraction achieve similar effects across a wide range of experiments on the Librispeech dataset, including both MMI and MBR criteria, as well as neural transducers and language models of different context sizes.
Furthermore, the authors provide an in-depth study to show that sequence discriminative training has a minimal effect on the commonly used zero-encoder ILM estimation, but a joint effect on both the encoder and prediction + joint network for posterior probability reshaping, including both ILM and blank suppression.
통계
The paper reports the following key metrics:
Word error rates (WERs) of various neural transducer models trained with different criteria (CE, MMI, MBR) and evaluated with different language model integration methods on the Librispeech dataset.
Perplexities (PPLs) of the h'zero ILMs extracted from the full-context transducer models trained with different criteria.
인용구
"Theoretically, we show a similar effect of ILM subtraction and MMI training by deriving the global optimum of MMI criterion."
"Empirically, we perform a series of comparisons between ILM subtraction and sequence discriminative training across different settings. Experimental results on Librispeech demonstrate that sequence discriminative training shares similar effects as ILM subtraction."
"Experimental results show a joint effect on both encoder and prediction + joint network to reshape posterior output including both label distribution and blank."