Core Concepts
ChatGPT's language identification abilities vary significantly across languages, with poor performance on African languages and better support for high-resource and distinct script languages.
Abstract
The paper investigates the language identification (LID) capabilities of ChatGPT, a large language model, across a diverse dataset of 670 languages from 24 language families and 30 different scripts. The authors curate the Babel-670 dataset and design a series of experiments to evaluate ChatGPT's ability to identify language names and language codes under zero-shot, few-shot, and label-provided settings.
The key findings are:
ChatGPT performs better at identifying language names than language codes, suggesting it has better knowledge of language names from pretraining.
There are significant performance disparities between different difficulty levels, with easier settings (providing a label set) leading to much higher accuracy compared to the hard setting (no label set).
Compared to smaller finetuned LID tools, ChatGPT lags behind, especially on African languages.
Languages utilizing distinct scripts tend to achieve higher F1 scores, while languages sharing scripts like Latin have lower performance.
Geographically, African languages receive the least support from ChatGPT, highlighting the model's limitations in serving diverse linguistic communities.
The authors conclude that current large language models like ChatGPT would benefit from further development to improve their language identification capabilities, especially for low-resource and underrepresented languages.
Stats
"ChatGPT has poor performance on African languages."
"Languages utilizing distinct scripts generally achieve higher F1 score."
"There is a significant negative correlation between the number of languages utilizing a particular script and the average F1 score of those languages."
Quotes
"ChatGPT's ability varies remarkably between low-resource and high-resource languages and among different regions."
"The provision of a label set boosts confidence by eliminating numerous potential candidates."
"Left-behinds and scrapping-bys languages have exceptionally limited data for NLP work."