Core Concepts
StreamingDialogue introduces conv-attn sinks to compress dialogue history efficiently, enhancing long-term memory capabilities and reducing computational complexity.
Abstract
StreamingDialogue proposes a method to compress dialogue history into conv-attn sinks, improving efficiency and memory usage while enhancing long-term memory. The approach outperforms baselines in dialogue tasks and achieves significant speedup.
The content discusses the challenges faced by Large Language Models (LLMs) in handling dialogues with long contexts due to efficiency issues. It introduces the concept of "conversational attention sinks" to aggregate information efficiently. By compressing utterances into these sinks, the method can handle prolonged dialogues effectively.
Standard LLMs struggle with context size during pre-training, especially for dialogue tasks. The attention mechanism incurs computational complexity growth with text length, making it challenging to support prolonged dialogues. StreamingDialogue addresses this issue by compressing historical information into conv-attn sinks.
The proposed method demonstrates superior performance compared to existing sparse attention methods and memory-augmented baselines. It achieves better scores in BLEU, ROUGE, and Distinct metrics while maintaining lower perplexity levels. Human evaluation also confirms its superiority in fluency, coherence, and consistency.
Overall, StreamingDialogue offers an efficient solution for handling prolonged dialogues with enhanced long-term memory capabilities and reduced computational complexity.
Stats
Current LLMs demonstrate handling a window size of 200k or more.
Our method achieves a 4 × speedup and an 18 × reduction in memory usage compared to dense attention recomputation.
In MSC dataset: Dense model has PPL of 7.58; Local model has BLEU of 13.34%; Big Bird model has R-L score of 15.32%.
Our method shows a BLEU-1 score of 89.19%, indicating effective compression of dialogue information.
Ablation experiments show significant declines in performance when SMR or LMR strategies are ablated.
Quotes
"Our method outperforms sparse attention baselines and memory-augmented baselines."
"StreamingDialogue effectively recalls distant historical information."
"The absence of SMR results in prominent declines in BLEU and ROUGE scores."