Efficient Conditional Generation with Cross-Attending to Cached Context
This work introduces XC-CACHE, a method that leverages cross-attention to condition language model generation on pre-computed context representations, enabling efficient inference by drastically reducing the memory footprint required for caching context.