Core Concepts
Iterative translation refinement with large language models can effectively reduce "translationese" in the output, achieving comparable or improved translation quality compared to initial machine translations and even human references.
Abstract
The paper proposes an iterative translation refinement method that leverages the power of large language models (LLMs) like GPT-3.5 to produce more natural and fluent translations. The key insights are:
Iterative refinement: The authors prompt the LLM to refine the initial translation in multiple rounds, allowing the model to rewrite the translation from scratch rather than just fixing errors.
Anchoring to source and initial translation: The refinement process is anchored to both the source input and the initial translation, ensuring the refined output maintains quality and relevance.
Leveraging target-side language modeling: LLMs have seen orders of magnitude more target-side data than typical translation or post-editing datasets, enabling them to generate more natural target language.
Experiments on high-resource language pairs (EN-DE, EN-ZH) and low/medium-resource pairs (EN-JA, DE-FR, SAH-RU, UK-CS) show that the refined translations achieve comparable or higher neural metric scores compared to initial LLM translations, despite significant drops in string-based metrics like BLEU. Human evaluations further demonstrate that the refined outputs are preferred over both initial LLM translations and human references in terms of reduced "translationese" - unnatural language due to source interference and the translation process.
The authors also investigate different refinement strategies, finding that starting with a reasonable initial translation and anchoring the process to the source input are crucial for obtaining high-quality results. Paraphrasing without the source input, on the other hand, leads to semantic drift.
Overall, the paper presents a simple yet effective method to leverage the strengths of LLMs for more natural and fluent translation, going beyond just fixing errors.
Stats
A new regulation stipulates that in Campania, indoor public places must wear masks, with a maximum fine of 1000 euros for those who violate the rule.
According to a new decree, people must wear masks in indoor public places in Campania from now on, and offenders can be fined up to 1,000 euros.
Quotes
"Our method offers two strengths for combating translationese: 1) LLM prompting allows for iterative and arbitrary re-writing compared to APE which is limited to error fixing without style improvement (Ive et al., 2020); 2) incorporating natural language data leads to more natural translations (Sennrich et al., 2016; Freitag et al., 2019), and LLMs have seen target-side data orders of magnitude larger than datasets for translation or post-editing."