Localizing Large Language Models for the Arabic Language: Developing AceGPT
This paper proposes a comprehensive solution to localize large language models for the Arabic language, including further pre-training with Arabic texts, supervised fine-tuning with native Arabic instructions and responses, and reinforcement learning with a reward model aligned to local culture and values. The resulting model, AceGPT, sets a new state-of-the-art standard for open Arabic language models across various benchmarks.