CLAIRA: A Simple and Interpretable Measure for Evaluating Audio Captions Using Large Language Models
CLAIRA is a simple and flexible method that leverages the zero-shot capabilities of large language models (LLMs) to evaluate candidate audio captions by directly asking LLMs for a semantic distance score, providing an interpretable justification for the score.