Using BFloat16 precision, while computationally efficient, degrades the relative positional encoding of RoPE in long-context language models, particularly impacting the first token, and this issue is addressed by introducing AnchorAttention, a novel attention mechanism using a shared anchor token to improve long-context performance and training speed.
LongReward leverages an off-the-shelf LLM to provide multi-dimensional rewards for long-context model responses, enabling the use of reinforcement learning algorithms like DPO to significantly improve the performance and faithfulness of long-context LLMs.
This paper introduces EM-LLM, a novel architecture inspired by human episodic memory that significantly enhances the context processing capabilities of large language models (LLMs), enabling them to handle practically infinite context lengths while maintaining computational efficiency.
SharedLLM is a novel approach to extend the context window size of large language models (LLMs) by using a hierarchical architecture with two short-context LLMs and a specialized tree-style data structure for efficient multi-grained context compression and query-aware information retrieval.
Large language models (LLMs) struggle to effectively utilize their full context length due to a left-skewed position frequency distribution in relative position encodings, which leads to undertraining on long-range dependencies.
타이판은 선택적 어텐션 레이어를 통해 장거리 종속성 처리를 강화하여 Mamba의 효율성과 Transformer와 같은 성능을 결합한 하이브리드 아키텍처로, 최대 100만 토큰의 컨텍스트 길이에서도 정확성과 효율성을 유지하며 광범위한 작업에서 뛰어난 성능을 제공합니다.
Taipan, a novel hybrid architecture for language modeling, surpasses existing models in efficiency and performance by strategically integrating Selective Attention Layers (SALs) within the efficient Mamba-2 framework to balance computational efficiency with enhanced long-range dependency handling.
While long-context large language models (LLMs) generally outperform Retrieval Augmented Generation (RAG) in long-context understanding tasks, RAG remains a cost-effective alternative, especially for tasks exceeding the model's context window.
本文提出了一种名为 ACER 的新方法,用于在没有人工标注数据的情况下,自动将语言模型的功能扩展到更长的上下文。
대규모 언어 모델(LLM)의 장문 텍스트 이해 능력을 향상시키기 위해 검색 기반 자동 컨텍스트 확장 방법인 ACER을 소개합니다. ACER은 레이블링된 데이터 없이도 효과적으로 모델의 성능을 향상시켜 장문 텍스트 처리 작업에 유용하게 활용될 수 있습니다.