Core Concepts
A technical assistant powered by Large Language Models (LLM) designed to effectively assist algorithm developers by providing insightful responses to questions related to open-source algorithm projects in group chat scenarios.
Abstract
The authors present HuixiangDou, a technical assistant powered by Large Language Models (LLM) to assist algorithm developers in group chat scenarios. The key contributions include:
Designing an algorithm pipeline specifically for group chat scenarios to address unique requirements such as avoiding message flooding, eliminating hallucination, and understanding domain-specific knowledge.
Verifying the reliable performance of text2vec in task rejection to filter out irrelevant messages.
Identifying three critical requirements for LLMs in technical-assistant-like products: scoring ability, In-Context Learning (ICL), and Long Context support.
The system integrates multiple components to provide effective responses in group chats:
Preprocess: Handles user input by concatenating messages, parsing images, and filtering out irrelevant content.
Rejection Pipeline: Uses text2vec and LLM scoring to identify and dismiss casual chat-like discourse, ensuring the assistant only responds to genuine technical questions.
Response Pipeline: Employs keyword extraction, feature reranking, web search, and knowledge graph integration to retrieve relevant information. It also utilizes LLM scoring to evaluate the relevance of responses and ensure safety.
The authors conducted extensive experiments to validate the feasibility of key technical components, including fine-tuning LLMs, evaluating text2vec performance, and optimizing long context handling. They also explored alternative approaches like NLP and prompting techniques, but found them to have significant limitations.
The authors conclude that as long as an LLM has the necessary capabilities, such as understanding domain-specific terminologies, supporting long context, scoring ability, and In-Context Learning, it can effectively address most technical demands within group chat scenarios. However, they acknowledge that as user questions become more advanced, providing satisfactory responses becomes increasingly challenging, requiring efficient further pretraining of the LLM.
Stats
In a group of 1,303 domain-related queries, 11.6% were identified as user questions using LLM scoring.
The text2vec-large-chinese model achieved a precision of 0.99 and a recall of 0.92 in the refusal-to-answer task on manually annotated data.
The ReRoPE method, combined with dynamic quantization, enabled support for 40k token length on a single A100 80G card.
Quotes
"Even a single instance of hallucination could make users perceive the bot as unreliable from a product perspective. Therefore, the system is implemented to avoid creating any false impressions of understanding."
"LangChain (langchain contributors, 2023) and wenda (wenda contributors, 2023) were originally used for RAG. After repeated tests, we think their retrieval abilities are normal, but surprisingly suitable for telling whether the question deserves to be answered."
"Directly using snippet to answer questions can lead to local optima. We read the original text corresponding to the snippet and hand it over to the LLM for processing along with the original question."