Jamba: A Hybrid Transformer-Mamba Language Model with Improved Performance and Efficiency
Jamba is a novel hybrid language model architecture that combines Transformer and Mamba (state-space) layers, along with a mixture-of-experts (MoE) component, to achieve improved performance and efficiency compared to pure Transformer models.