This report presents the GMUNLP team's approach to the DIALECT-COPA shared task, which aims to evaluate the commonsense reasoning capabilities of large language models (LLMs) on South Slavic micro-dialects.
The key highlights and insights are:
The authors explore the potential of data augmentation techniques to enhance the performance of language models on dialectal commonsense reasoning tasks. They utilize a diverse set of language models, including smaller models suitable for low-resource settings, mid-size models that balance task-specific performance and language understanding, and closed-source models that generate high-quality synthetic task data.
The authors achieve the highest scores across all three test datasets in the open-source model category. Their solution also performs on par with the GPT-4 zero-shot iterative prompting approach employed by one of the teams, demonstrating the competitiveness of the proposed approach against state-of-the-art closed-source models.
The authors observe that increasing the data quantity through various data augmentation techniques primarily improves performance for most languages and low-resource dialects. However, discarding instances written in the Cyrillic script can boost performance for certain languages and dialects, while hindering others.
The authors experiment with cross-lingual mix-and-match strategies, but do not find conclusive patterns indicating that this approach consistently makes the model more language-agnostic, as it helps in some cases while hindering performance in others.
The authors find that full fine-tuning of the comparatively smaller, non-instruction-tuned, but language-specific BERTić model cannot surpass the performance of the multilingual, instruction-tuned Aya-101 model. However, applying the same data combinations to perform instruction tuning on the Aya-101 model leads to an overall performance boost.
Na inny język
z treści źródłowej
arxiv.org
Głębsze pytania