Evaluating and Improving Multilingual Large Language Models for Underrepresented Languages
This thesis presents a comprehensive evaluation of multilingual large language models (LLMs) on underrepresented languages, revealing limitations in their multilingual and multicultural generalization. It proposes data-efficient methods to improve the inclusivity and diversity of multilingual LLMs, enabling better performance on underrepresented languages without sacrificing high-resource language capabilities.