The paper presents the PORTULAN ExtraGLUE benchmark, which consists of machine-translated versions of several well-known English language understanding tasks, including single sentence tasks, similarity tasks, inference tasks, question-answering tasks, and reasoning tasks. The authors discuss the challenges and limitations of using machine translation to create these datasets, such as issues with pronoun resolution, gendered nouns, and named entity translation.
To validate the datasets, the authors fine-tune low-rank adaptations (LoRA) of the Albertina language model, a state-of-the-art open encoder model for Portuguese, on 10 of the PORTULAN ExtraGLUE tasks. The resulting fine-tuned models are made available as baselines for future research.
The authors compare the performance of the Albertina LoRA models on the PORTULAN ExtraGLUE datasets to the performance of the multilingual XLM-RoBERTa-XL model and the English DeBERTa-V2-XXLarge model. While the Albertina LoRA models lag behind the English model, they outperform the multilingual model, demonstrating the benefits of using a monolingual model for Portuguese.
The authors acknowledge the limitations of machine-translated datasets and call for future work to improve the benchmark through manual curation and the development of new datasets from scratch to better reflect the Portuguese language and its cultural nuances.
翻译成其他语言
从原文生成
arxiv.org
更深入的查询