Sign In

RAFT: Adapting Language Model to Domain Specific RAG

Core Concepts
RAFT improves language models for in-domain question answering by training them to ignore distractor documents and focus on relevant information.
RAFT introduces a novel training recipe that enhances language models' ability to answer questions in an "open-book" setting within specific domains. By incorporating Retrieval Augmented Fine Tuning, the model learns to filter out distractor documents and extract key information from relevant sources. This approach improves the model's performance on various datasets like PubMed, HotpotQA, and Gorilla API Bench. RAFT combines supervised fine-tuning with retrieval augmented generation, enabling models to reason effectively and provide accurate responses based on domain-specific knowledge.
RAFT consistently outperforms Supervised-finetuning both with- and without- RAG across PubMed (Dernoncourt & Lee, 2017), HotpotQA (Yang et al., 2018), and HuggingFace Hub, Torch Hub, and Tensorflow Hub Gorilla datasets (Patil et al., 2023). RAFT does much better on tasks like HotpotQA and HuggingFace datasets (30.87% on HotpotQA and 31.41% on HuggingFace). Compared with DSF on the specific dataset, our model does better at relying on the provided context to solve the problem.
"RAFT aims to not only enable models to learn domain-specific knowledge through fine-tuning but also ensure robustness against inaccurate retrievals." "In RAFT, we train the model to answer the question from Document(s) to generate an answer." "RAFT consistently outperforms the baselines across various specialized domains."

Key Insights Distilled From

by Tianjun Zhan... at 03-18-2024

Deeper Inquiries

Should large language models always be trained with oracle context for Retrieval-Augmented Generation?

In the context of Retrieval-Augmented Generation (RAG), training large language models (LLMs) solely with oracle context may not always be the most effective approach. While having access to relevant documents during training can help the model learn how to extract information from those specific sources, it may limit the model's ability to generalize and adapt to unseen or diverse contexts at test time. Training LLMs with a mix of relevant and irrelevant documents can enhance their performance in real-world applications by improving their robustness and ability to discern pertinent information from noise. By exposing the model to both types of data during training, it learns to differentiate between valuable content and distractors, leading to more accurate responses when faced with varying scenarios.

Does training with a mix of relevant and irrelevant documents improve a model's performance in real-world applications?

Yes, training a model with a combination of relevant (oracle) and irrelevant (distractor) documents can significantly improve its performance in real-world applications. This approach helps the model develop robustness against noisy or misleading information that it might encounter during inference. By exposing the model to various types of data during training, it learns to filter out irrelevant details and focus on extracting meaningful insights from the provided context. Additionally, incorporating distractor documents in the training data encourages the model to become more discerning in its decision-making process, ultimately enhancing its ability to provide accurate responses when presented with new or unseen information at test time.

How can models be made more robust against irrelevant text in retrieval pipelines during test time?

To make models more resilient against irrelevant text in retrieval pipelines during test time, several strategies can be implemented: Diverse Training Data: Train models using a diverse set of both relevant and irrelevant documents so they learn how to distinguish between useful information and noise effectively. Fine-Tuning Techniques: Implement fine-tuning methods that expose models to different levels of noise or distractions during training, helping them adapt better when faced with similar challenges at inference. Regularization Techniques: Incorporate regularization techniques like dropout or weight decay during training sessions which could prevent overfitting on noisy data. Ensemble Methods: Utilize ensemble methods where multiple versions of the same base model are trained on slightly varied datasets containing different combinations of relevant and distracting texts. By implementing these approaches, models can develop stronger capabilities for filtering out irrelevant text while focusing on extracting essential details from retrieved documents accurately during test-time scenarios within retrieval pipelines.