Core Concepts
LLM Robustness Evaluation in Multilingual Retrieval-Augmented Generation
Stats
Models like LLAMA-2, Orca-2, and FLAN-T5 observe high hallucination rates.
Mistral has a lower hallucination rate but a high error rate.
GPT-4 provides the best tradeoff on both subsets.
Quotes
"RAG instills information from reliable knowledge corpora to generate accurate and faithful responses."
"LLMs are the de-facto choice for generation in RAG."
"NoMIRACL can serve as a valuable dataset towards LLM robustness evaluation."