The paper explores the application of reinforcement learning (RL) to edit-based non-autoregressive neural machine translation (NAR NMT) models, focusing on the Levenshtein Transformer (LevT) architecture.
The key highlights are:
Two RL approaches are introduced and analyzed:
The episodic reward maximization approach is shown to significantly outperform the baseline LevT model, even approaching the performance of distillation-based models. The stepwise approach, however, exhibits more limited improvements.
The paper investigates the impact of temperature settings on the softmax sampling during RL training. Constant temperature settings as well as annealing schedules are explored, highlighting the importance of proper temperature control for NAR models.
The experiments demonstrate that RL can effectively address the challenges faced by NAR models, such as the large decoding space and difficulty in capturing target word dependencies. The methods proposed are orthogonal to existing research on NAR architectures, indicating potential for widespread applicability.
다른 언어로
소스 콘텐츠 기반
arxiv.org
더 깊은 질문