Comprehensive Benchmarking of Confidence Calibration in Multilingual Question Answering Large Language Models
Multilingual pre-trained Large Language Models (LLMs) are highly effective at Question Answering (QA), but their confidence estimates are often poorly calibrated, especially for languages other than English. Effective strategies are needed to improve the confidence calibration of these models across diverse languages.