IISAN: Efficiently Adapting Multimodal Representation for Sequential Recommendation with Decoupled Parameter-Efficient Fine-Tuning
The core message of this paper is that the authors propose a novel Intra- and Inter-modal Side Adapted Network (IISAN) that follows a decoupled parameter-efficient fine-tuning (DPEFT) paradigm to efficiently adapt pre-trained large-scale multimodal foundation models for downstream sequential recommendation tasks. IISAN significantly reduces GPU memory usage and training time compared to full fine-tuning and existing embedded PEFT methods, while maintaining comparable recommendation performance.