INDICGENBENCH is a comprehensive benchmark for evaluating the generation capabilities of large language models (LLMs) on a diverse set of Indic languages. It consists of 5 user-facing tasks: cross-lingual summarization, machine translation, multilingual question answering, and cross-lingual question answering.
The benchmark covers 29 Indic languages across 13 writing scripts and 4 language families, with languages categorized into higher, medium, and lower resource groups based on web text availability. INDICGENBENCH extends existing datasets like CrossSum, FLORES, XQuAD, and XorQA to these Indic languages through high-quality human translations, providing the first-ever evaluation datasets for up to 18 Indic languages.
The authors evaluate a wide range of proprietary and open-source LLMs, including GPT-3.5, GPT-4, PaLM-2, mT5, Gemma, BLOOM, and LLaMA, on INDICGENBENCH. They find that while the largest PaLM-2 models perform the best overall, there is a significant performance gap between English and Indic languages across all models, highlighting the need for further research to develop more inclusive multilingual language models.
To Another Language
from source content
arxiv.org
Дополнительные вопросы