Multicalibration Techniques for Reliable Confidence Scoring in Large Language Models
This paper proposes the use of "multicalibration" to yield interpretable and reliable confidence scores for outputs generated by large language models (LLMs), which can help detect hallucinations.