This paper presents the AAdaM system developed for the SemEval-2024 Task 1 on Semantic Textual Relatedness (STR) for African and Asian languages. The task aims to measure the semantic relatedness between pairs of sentences in a range of under-represented languages.
The key highlights and insights are:
Data Augmentation: To address the challenge of limited training data for non-English languages, the authors perform data augmentation via machine translation from English resources.
Task-Adaptive Pre-training: The authors apply task-adaptive pre-training on unlabeled task data to better adapt the pre-trained language model to the STR task.
Model Tuning: The authors explore both full fine-tuning and adapter-based tuning, and find that adapter-based tuning can achieve comparable performance to full fine-tuning, while being more parameter-efficient.
Cross-lingual Transfer: For cross-lingual transfer in subtask C, the authors utilize the MAD-X framework, which enables efficient zero-shot transfer by replacing only the language-specific adapters.
Evaluation: In subtask A (supervised learning), the authors' system ranks first out of 40 teams on average, and performs the best in Spanish. In subtask C (cross-lingual transfer), the system ranks first among 18 teams on average, and achieves the best performance in Indonesian and Punjabi.
Analysis: The authors provide a fine-grained analysis, revealing that capturing nuanced semantic relationships remains a challenge, especially for languages with lower relatedness scores.
翻譯成其他語言
從原文內容
arxiv.org
深入探究