Core Concepts
Contextualized evaluations, which involve providing relevant context during the evaluation of language models, can significantly alter evaluation outcomes, leading to more reliable assessments and insights into model behavior.
Malaviya, C., Chang, J. C., Roth, D., Iyyer, M., Yatskar, M., & Lo, K. (2024). Contextualized Evaluations: Taking the Guesswork Out of Language Model Evaluations. arXiv preprint arXiv:2411.07237.
This research investigates the impact of incorporating context during language model evaluations, particularly for underspecified queries common in benchmark datasets. The authors examine whether providing context influences evaluation conclusions, the criteria used by evaluators, and the ability to assess model adaptation to diverse user contexts.