The paper introduces an Anti-Language Model (Anti-LM) decoding objective with exponential decay to address the weaknesses of zero-shot in-context machine translation. The key idea is to penalize the logits of the next token continuation of the source language, discouraging the model from continuing to generate in the source language instead of translating to the target language.
The authors evaluate their proposed method across 3 model types and sizes, 3 language directions, and for both greedy decoding and beam search. The results show that the Anti-LM objective outperforms other state-of-the-art decoding objectives, with up to 20 BLEU point improvement from the default objective in some settings.
The analysis reveals that the majority of the gains come from addressing the "failure to translate" cases, where the model either generates in the source language or does not generate any output. The authors also find that the Anti-LM objective is particularly effective for the XGLM model, and that the choice of decay function for the discount factor can impact performance.
Additionally, the authors investigate the effect of providing the instructions in the source language versus the target language, and find that the Anti-LM approach is more beneficial when the instructions are given in the source language. This suggests that without the Anti-LM calibration, the true zero-shot capabilities of large language models may be underreported.
다른 언어로
소스 콘텐츠 기반
arxiv.org
더 깊은 질문