Measuring the Faithfulness of Free-Text Explanations in Large Language Models
Explanations provided by large language models may not faithfully capture the factors responsible for their predictions. This work introduces a novel metric, Correlational Explanatory Faithfulness (CEF), to better assess the faithfulness of free-text explanations by accounting for both the impact of input features on model predictions and the frequency with which explanations mention those features.