ANGOFA: Leveraging OFA Embedding Initialization and Synthetic Data to Develop Tailored Language Models for Angolan Languages
This paper introduces four multilingual pre-trained language models (PLMs) tailored for five Angolan languages using a Multilingual Adaptive Fine-tuning (MAFT) approach. The authors demonstrate that employing informed embedding initialization through the OFA method and incorporating synthetic data significantly enhances the performance of the MAFT models on downstream tasks.