LOCOST: State-Space Models for Long Document Abstractive Summarization
Core Concepts
State-space models offer an efficient alternative to transformers for processing long sequences, achieving competitive results with reduced memory usage.
Abstract
State-space models provide a low-complexity solution for encoding long texts, enabling efficient handling of significantly longer sequences compared to traditional transformers. The proposed LOCOST architecture demonstrates competitive performance in abstractive summarization tasks, achieving up to 96% of the top-performing sparse transformers' performance while saving memory during training and inference. By leveraging state-space models, the model can process input sequences exceeding 600K tokens, setting new benchmarks in full-book summarization tasks.
LOCOST
Stats
State-space models have a computational complexity of O(L log L).
LOCOST achieves up to 50% memory savings during training and up to 87% during inference.
The model can handle inputs exceeding 600K tokens at inference time.
Quotes
"State-space models are a low-complexity alternative to transformers for encoding long sequences."
"LOCOST demonstrates competitive performances compared to state-of-the-art sparse transformers while being significantly more memory-efficient."
"The model is able to process entire input sequences of up to 600K tokens."