This study investigates the effectiveness of using LLMs to generate culturally relevant commonsense QA datasets for Indonesian and Sundanese languages. The authors create datasets for these languages using various methods, including adapting existing English data (LLM_ADAPT), manually generating data with human annotators (HUMAN_GEN), and automatically generating data with LLMs (LLM_GEN).
The key findings are:
Automatic data adaptation from English is less effective, especially for the lower-resource Sundanese language. The performance gap between Indonesian and Sundanese highlights the challenges in transferring knowledge across languages with different morphological features.
When directly generating data in the target languages, GPT-4 Turbo can produce questions with adequate general knowledge in both Indonesian and Sundanese, but the cultural "depth" is not as strong as human-generated data.
LLMs perform better on their own generated data (LLM_GEN) compared to human-generated data (HUMAN_GEN), indicating the former is less challenging. However, many open-source LLMs still struggle to answer LLM-generated questions, suggesting significant room for improvement.
Analysis of lexical diversity shows human annotators generate more unique and culturally-specific terms compared to LLMs, which tend to use more general concepts.
While LLM-generated data may have lower quality, it can still be a practical and cost-effective solution, especially for low-resource languages, when combined with human curation and revision.
To Another Language
from source content
arxiv.org
Önemli Bilgiler Şuradan Elde Edildi
by Rifki Afina ... : arxiv.org 04-17-2024
https://arxiv.org/pdf/2402.17302.pdfDaha Derin Sorular