This paper proposes a comprehensive solution to localize large language models for the Arabic language, including further pre-training with Arabic texts, supervised fine-tuning with native Arabic instructions and responses, and reinforcement learning with a reward model aligned to local culture and values. The resulting model, AceGPT, sets a new state-of-the-art standard for open Arabic language models across various benchmarks.


coremsg

localizing-large-language-models-for-the-arabic-language-developing-acegpt


Localizing Large Language Models for the Arabic Language: Developing AceGPT