CLAIRA is a simple and flexible method that leverages the zero-shot capabilities of large language models (LLMs) to evaluate candidate audio captions by directly asking LLMs for a semantic distance score, providing an interpretable justification for the score.


coremsg

claira-a-simple-and-interpretable-measure-for-evaluating-audio-captions-using-large-language-models


CLAIRA: A Simple and Interpretable Measure for Evaluating Audio Captions Using Large Language Models