Improving Neural Machine Translation Performance through Direct Preference Optimization and Minimum Bayes Risk Decoding
Direct Preference Optimization (DPO) can fine-tune Multilingual Large Language Models (MLLMs) to achieve the gains of Minimum Bayes Risk (MBR) decoding without additional computation during inference.