대규모 언어 모델의 환각 평가를 위한 HaluEval-Wild 벤치마크 소개
Large language models (LLMs) exhibit hallucinations in real-world interactions, necessitating a novel benchmark like HaluEval-Wild to assess and enhance their reliability.