The MasonTigers team participated in all three tracks of the SemEval-2024 Task 1 on Semantic Textual Relatedness, which covered 14 diverse languages.
In the supervised track, the team leveraged an ensemble of statistical machine learning approaches, including ElasticNet and Linear Regression, combined with language-specific BERT models and the LaBSE sentence transformer. This ensemble approach achieved rankings ranging from 11th to 21st place across the 9 languages in this track.
For the unsupervised track, the team relied on embeddings generated by language-specific BERT models, along with TF-IDF and PPMI, and performed an average ensemble to predict the semantic relatedness. This method resulted in rankings from 1st to 8th place across the 12 languages in this track.
In the cross-lingual track, the team utilized training data from other languages (excluding the target language) and applied a similar ensemble approach using statistical machine learning models and language-specific BERT embeddings not aligned with the target language. The team's performance ranged from 5th to 12th place across the 12 languages in this track.
The key insights from the team's experiments include the effectiveness of ensemble techniques in leveraging the strengths of different approaches, the importance of language-specific models in capturing nuances across diverse languages, and the challenges posed by limited data, domain specificity, and cultural differences in accurately predicting semantic textual relatedness.
다른 언어로
소스 콘텐츠 기반
arxiv.org
더 깊은 질문