The study evaluates two data augmentation techniques, cross-lingual transfer, and machine translation, for monolingual semantic textual similarity (STS). The comparison is conducted on Japanese and Korean languages, known for their linguistic dissimilarity to English. The research aims to find the most suitable technique for monolingual STS by addressing specific research questions. The findings suggest that both techniques yield similar performance levels in monolingual STS tasks. Surprisingly, the study reveals that the cross-lingual transfer of Wikipedia data outperforms machine translation in certain scenarios. This indicates the potential of using native Wikipedia data as an effective training resource for improving sentence embeddings.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Sho Hoshino,... at arxiv.org 03-11-2024
https://arxiv.org/pdf/2403.05257.pdfDeeper Inquiries