Core Concepts
Suvach is a novel benchmark for evaluating Hindi question answering models, generated using large language models to overcome the limitations of machine-translated datasets.
Abstract
The paper introduces Suvach, a new benchmark for evaluating extractive question answering (QA) in Hindi. The key points are:
Current Indic language QA benchmarks often rely on machine translation of English datasets, which can introduce biases and inaccuracies.
To address this, the authors propose a methodology to generate a high-quality Hindi QA dataset using large language models (LLMs). This involves:
Creating prompts with relevant context from Hindi Wikipedia dumps
Using LLMs to generate question-answer pairs from the prompts
Validating the generated content for relevance, accuracy, and clarity
The resulting Suvach dataset contains over 100,000 question-answer pairs, with an average of 1,200 tokens per question. It provides three levels of difficulty:
Question only
Question with context
Question with context and multiple-choice options
The authors argue that this LLM-powered approach to benchmark generation can be generalized to create high-quality datasets for other Indic languages, fostering advancements in Indic NLP research.
Stats
"Recent breakthroughs in Large Language Models (LLMs), particularly those following the advent of ChatGPT, were transformative."
"This dataset consists of over 100k question answers in Hindi, with 1200 tokens per question on average."
Quotes
"While machine translation offers a temporary solution, it is not a sustainable approach for developing long-term, large-scale benchmarks across all Indian languages."
"This finding suggests a promising avenue for LLM-powered benchmark creation for low-resource languages."