Bibliographic Information: Verma, S., Khan, M.S.U.R., Kumar, V., Murthy, R., & Sen, J. (2024). MILU: A Multi-task Indic Language Understanding Benchmark. arXiv preprint arXiv:2411.02538v1.
Research Objective: This paper introduces MILU, a novel benchmark designed to evaluate the cultural understanding and linguistic capabilities of LLMs in 11 Indic languages.
Methodology: The researchers curated a dataset of multiple-choice questions from over 1500 competitive exams in India, covering 8 domains and 42 subjects. They evaluated 45 LLMs, including proprietary, open-source, and language-specific models, using zero-shot, one-shot, and five-shot settings.
Key Findings: The study found that current LLMs, even those specifically trained for Indic languages, struggle with MILU. GPT-4o achieved the highest average accuracy at 72%, while other models, particularly language-specific ones, performed closer to random baselines. The research also revealed that models perform better in high-resource languages and struggle with culturally specific content in domains like Arts & Humanities and Law & Governance.
Main Conclusions: The authors conclude that existing LLMs lack sufficient understanding of Indic languages and cultures. They emphasize the need for more inclusive training datasets and culturally relevant benchmarks like MILU to guide the development of more culturally aware LLMs.
Significance: This research significantly contributes to the field of NLP by introducing a much-needed benchmark for evaluating LLM performance in Indic languages, focusing on both linguistic and cultural understanding.
Limitations and Future Research: The study acknowledges limitations such as the focus on 11 Indic languages, computational constraints in evaluating larger models, and reliance on the log-likelihood evaluation approach. Future research could address these limitations by expanding language coverage, exploring alternative evaluation methods, and investigating the impact of different training datasets on cultural understanding.
Till ett annat språk
från källinnehåll
arxiv.org
Djupare frågor