Core Concepts
DIALECTBENCH is the first comprehensive benchmark for evaluating NLP systems on language varieties, highlighting performance disparities and promoting research on non-standard dialects.
Abstract
1. Introduction
Large-scale multilingual benchmarks like XTREME have focused on standard language varieties.
DIALECTBENCH aims to fill the gap by including non-standard dialects and language varieties.
Provides evidence of performance disparities between standard and non-standard language varieties.
2. DIALECTBENCH
Unifies dialectal datasets across tasks for research on language varieties.
Includes variety selection, cluster mapping, and task diversity.
Evaluates models using in-cluster fine-tuning, zero-shot transfer, and combined fine-tuning.
3. Experiments
Evaluates models like mBERT, XLM-R, and Mistral 7B across tasks.
Utilizes various training approaches like in-variety fine-tuning and zero-shot evaluation.
4. Results
Reports maximum obtainable scores for different tasks across language clusters.
Highlights performance gaps between high-resource and low-resource varieties within clusters.
5. Discussion
Discusses model hyperparameter tuning challenges and positive zero-shot transfer for Latin script varieties.
Evaluates LLM performance via in-context learning and addresses limitations in evaluation metrics.
6. Conclusion
Proposes future improvements for DIALECTBENCH to enhance data quality and expand task coverage.
Stats
DIALECTBENCHは、281の言語バリエーションをカバーする大規模なNLPベンチマークです。
Quotes
"Language technologies should be judged on their usefulness in real-world use cases." - Fahim Faisal et al.