Balancing Recall and Throughput in Linear Attention Language Models
The authors explore the tradeoff between model efficiency and recall ability in linear attention language models, proposing a novel architecture to balance these factors effectively.