Guerdan, L., Wallach, H., Barocas, S., & Chouldechova, A. (2024). A Framework for Evaluating LLMs Under Task Indeterminacy. In NeurIPS 2024 Workshops on Evaluating Evaluations (EvalEval) and Statistical Foundations of LLMs and Foundation Models (SFLLM).
本研究旨在探討如何在存在任務不確定性的情況下,更準確地評估大型語言模型(LLM)的性能。
本研究為LLM評估領域提供了新的思路和方法,有助於更準確地評估LLM的性能,並促進LLM技術的發展。
Para Outro Idioma
do conteúdo original
arxiv.org
Principais Insights Extraídos De
by Luke Guerdan... às arxiv.org 11-22-2024
https://arxiv.org/pdf/2411.13760.pdfPerguntas Mais Profundas