This paper presents the AAdaM system developed for the SemEval-2024 Task 1 on Semantic Textual Relatedness (STR) for African and Asian languages. The task aims to measure the semantic relatedness between pairs of sentences in a range of under-represented languages.
The key highlights and insights are:
Data Augmentation: To address the challenge of limited training data for non-English languages, the authors perform data augmentation via machine translation from English resources.
Task-Adaptive Pre-training: The authors apply task-adaptive pre-training on unlabeled task data to better adapt the pre-trained language model to the STR task.
Model Tuning: The authors explore both full fine-tuning and adapter-based tuning, and find that adapter-based tuning can achieve comparable performance to full fine-tuning, while being more parameter-efficient.
Cross-lingual Transfer: For cross-lingual transfer in subtask C, the authors utilize the MAD-X framework, which enables efficient zero-shot transfer by replacing only the language-specific adapters.
Evaluation: In subtask A (supervised learning), the authors' system ranks first out of 40 teams on average, and performs the best in Spanish. In subtask C (cross-lingual transfer), the system ranks first among 18 teams on average, and achieves the best performance in Indonesian and Punjabi.
Analysis: The authors provide a fine-grained analysis, revealing that capturing nuanced semantic relationships remains a challenge, especially for languages with lower relatedness scores.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Miaoran Zhan... lúc arxiv.org 04-03-2024
https://arxiv.org/pdf/2404.01490.pdfYêu cầu sâu hơn