Evaluating Cultural Commonsense Reasoning Across Diverse Indonesian Provinces
Even the best open-source language models struggle to comprehend the diverse cultures across eleven Indonesian provinces, with the highest accuracy reaching only 53.2%. Incorporating location context significantly enhances model performance, especially in larger models like GPT-4.