INDICGENBENCH is a comprehensive benchmark for evaluating the generation capabilities of large language models (LLMs) on a diverse set of Indic languages. It consists of 5 user-facing tasks: cross-lingual summarization, machine translation, multilingual question answering, and cross-lingual question answering.
The benchmark covers 29 Indic languages across 13 writing scripts and 4 language families, with languages categorized into higher, medium, and lower resource groups based on web text availability. INDICGENBENCH extends existing datasets like CrossSum, FLORES, XQuAD, and XorQA to these Indic languages through high-quality human translations, providing the first-ever evaluation datasets for up to 18 Indic languages.
The authors evaluate a wide range of proprietary and open-source LLMs, including GPT-3.5, GPT-4, PaLM-2, mT5, Gemma, BLOOM, and LLaMA, on INDICGENBENCH. They find that while the largest PaLM-2 models perform the best overall, there is a significant performance gap between English and Indic languages across all models, highlighting the need for further research to develop more inclusive multilingual language models.
Ke Bahasa Lain
dari konten sumber
arxiv.org
Wawasan Utama Disaring Dari
by Harman Singh... pada arxiv.org 04-26-2024
https://arxiv.org/pdf/2404.16816.pdfPertanyaan yang Lebih Dalam