The paper presents evidence that the performance gap between large language models (LLMs) and specialized neural machine translation (NMT) systems may be closing, particularly for low-resource language pairs. The key findings are:
The authors find signs of data contamination in the FLORES-200 benchmark for the Claude 3 Opus LLM, calling into question the validity of evaluating Claude on this dataset.
By creating new, unseen evaluation benchmarks using BBC News articles, the authors show that Claude outperforms strong baselines like Google Translate and NLLB-54B on 25% of language pairs when translating into English. This includes both low-resource and high-resource language pairs.
Unlike previous LLMs, Claude demonstrates remarkable resource efficiency, with its translation performance (when English is the target language) being less dependent on the resource level of the language pair compared to the NLLB-54B NMT model.
The authors also find that when translating from English into low-resource languages, a large gap still exists between LLMs and state-of-the-art NMT systems on most languages. However, they show that Claude outperforms strong baselines for two such language pairs.
The authors demonstrate that the translation abilities of Claude can be leveraged to advance the state-of-the-art in traditional NMT by generating a parallel corpus from Claude translations and fine-tuning an inexpensive model on this corpus. They describe an approach that leverages Claude's context window to reduce distillation costs and improve translation quality.
Egy másik nyelvre
a forrásanyagból
arxiv.org
Mélyebb kérdések