Keskeiset käsitteet
Direct Preference Optimization (DPO) can fine-tune Multilingual Large Language Models (MLLMs) to achieve the gains of Minimum Bayes Risk (MBR) decoding without additional computation during inference.
Tiivistelmä
The authors propose a novel self-supervised fine-tuning method based on Direct Preference Optimization (DPO) to improve the translation performance of Multilingual Large Language Models (MLLMs).
The key insights are:
- MBR decoding can significantly boost the translation performance of MLLMs, but it is computationally expensive.
- The authors show how DPO can be used to fine-tune MLLMs to learn the same ranking preferences as MBR decoding, without any additional computation during inference.
- The DPO fine-tuning process uses a small monolingual dataset to create a preference dataset, where translation hypotheses are ranked by their MBR scores. The DPO algorithm is then used to fine-tune the MLLM to prefer the higher-ranked translations over the lower-ranked ones.
- The DPO MBR fine-tuned models, when decoded with beam search, achieve similar performance as MBR decoding of the original model, outperforming the base model on multiple NMT test sets.
- The authors investigate different preference pair selection strategies and the impact of the size of the hypothesis set used for DPO fine-tuning. They find that the method is robust to the selection strategy and can achieve strong performance with a smaller hypothesis set compared to MBR decoding.
Tilastot
MBR decoding can significantly boost the translation performance of MLLMs, outperforming greedy decoding and beam search.
DPO MBR fine-tuned models, when decoded with beam search, achieve similar performance as MBR decoding of the original model.
DPO MBR fine-tuning improves the translation performance of BLOOMZ and BLOOMZ-mt across a range of test sets, with gains of up to 4 BLEURT and 2 COMET scores compared to the base models.
Lainaukset
"Our goal is to fine-tune a base MLLM so that it has the same single-pass decoding performance as MBR decoding."
"MLLMs optimized for MBR preference achieve significantly better translation performance when decoded with beam search, achieving translation quality on par with MBR decoding of the original model."