Conceitos essenciais
Our system employs supervised and unsupervised techniques using BERT-based language models to achieve competitive performance on the SemEval-2024 Task 1 for semantic textual relatedness in Arabic dialects and Modern Standard Arabic.
Resumo
The paper presents our contributions to the SemEval-2024 shared task on semantic textual relatedness (STR). We focused on three Arabic datasets: Algerian, Moroccan, and Modern Standard Arabic (MSA).
For the supervised track (A), we fine-tuned BERT-based models (ArBERTv2 and AraBERTv2) using the provided training data. To enrich the data, we augmented the Moroccan dataset by generating additional sentence pairs using the Google Gemini generative model. This led to performance improvements on the Moroccan dialect.
For the unsupervised track (B), where training on labeled data is not allowed, we employed cosine similarity using average pooling embeddings from the BERT-based models. Our approaches achieved competitive results, ranking 1st for MSA, 5th for Moroccan, and 12th for Algerian.
The key highlights of our work include:
- Leveraging generative models for data augmentation to improve performance on the Moroccan dialect.
- Exploring the suitability of different BERT-based models for the Arabic dialects and MSA.
- Demonstrating the effectiveness of unsupervised techniques, such as cosine similarity, for the STR task in the absence of labeled training data.
Estatísticas
The Earth orbits the sun at a speed of ~110,000 km/h.
Earth rotates at ~1670 km/h around its axis.
Citações
"Semantic textual relatedness (STR) is a broader concept of semantic similarity. It measures the extent to which two chunks of text convey similar meaning or topics, or share related concepts or contexts."
"While the former task checks for the presence of similar meaning or paraphrase, STR takes a more comprehensive approach, evaluating relatedness across multiple dimensions, spanning topical similarity, conceptual overlap, contextual coherence, pragmatic connection, themes, scopes, ideas, stylistic conditions, ontological relations, entailment, temporal relation, as well as semantic similarity itself."