DeHaven, M. (2024). MARAGS: A Multi-Adapter System for Multi-Task Retrieval Augmented Generation Question Answering. In Proceedings of the KDD Cup 2024 (pp. TBD).
This paper presents MARAGS, a novel multi-adapter system designed for multi-task retrieval augmented generation (RAG) in question answering. The research aims to address the limitations of traditional RAG systems in handling diverse question types, dynamic answers, and varying topic popularity, as highlighted by the CRAG benchmark.
MARAGS utilizes a pipeline approach involving webpage processing, API call generation, candidate ranking, and retrieval augmented generation. Web pages are segmented using BeautifulSoup4, while API calls are generated using a LoRa adapter trained on Llama 3. Candidate ranking employs a cross-encoder model, and final answer generation leverages Llama 3 8B with task-specific LoRa adapters. To mitigate hallucinations, the training data is relabeled to encourage "I don't know" responses when relevant information is unavailable.
The paper demonstrates the effectiveness of MARAGS in handling various question answering tasks, achieving 2nd place in Task 1 and 3rd place in Task 2 of the KDD Cup 2024 CRAG competition. The results highlight the benefits of using a multi-adapter approach, task-specific fine-tuning, and strategies to reduce hallucinations. The study also identifies challenges related to specific domains (e.g., finance), question dynamism, and topic popularity.
MARAGS offers a promising solution for building robust and accurate RAG systems for complex question answering tasks. The authors emphasize the importance of addressing hallucinations in LLM-based systems and propose techniques to mitigate this issue. The paper contributes to the ongoing research on improving the reliability and trustworthiness of AI systems for real-world applications.
This research significantly contributes to the field of natural language processing, particularly in the area of question answering using RAG. The proposed MARAGS system and the insights gained from its evaluation provide valuable guidance for developing more effective and reliable question answering systems.
The study acknowledges limitations in handling certain question types and domains, suggesting further research to address these challenges. Future work could explore larger language models, advanced techniques for catastrophic forgetting prevention, and improved methods for handling dynamic and less popular topics.
Na inny język
z treści źródłowej
arxiv.org
Głębsze pytania