Core Concepts
Multi-modal Sequential Recommendation (MMSR) framework shows potential in enhancing recommendation quality by leveraging multi-modal information without relying on item IDs.
Abstract
This study explores the effectiveness of a Multi-modal Sequential Recommendation (MMSR) framework in enhancing recommendation quality by leveraging multi-modal information without relying on item IDs. The research systematically summarizes existing multi-modal related SR methods and distills the essence into four core components: visual encoder, text encoder, multimodal fusion module, and sequential architecture. The study delves into constructing MMSR from scratch, benefiting from existing multi-modal pre-training paradigms, and addressing common challenges like cold start and domain transferring. Experimental results across four real-world recommendation scenarios demonstrate the potential of ID-agnostic multi-modal sequential recommendation.
Abstract:
- Sequential Recommendation (SR) aims to predict future user-item interactions based on historical interactions.
- Multi-modal information is leveraged without using IDs to construct a Multi-Modal Sequential Recommendation (MMSR) framework.
- Existing multi-modal related SR methods are systematically summarized into four core components.
- The study explores constructing MMSR from scratch, benefiting from multi-modal pre-training paradigms, and addressing common challenges.
Introduction:
- SR models aim to recommend the next item of interest based on users' past interactions.
- Mainstream SR scenarios rely on user and item IDs, leading to limitations in transferability and cold-start scenarios.
- Multi-modal Sequential Recommendation (MMSR) leverages multi-modal information for stronger transferability and addressing cold-start issues.
Experiments:
- Various text and vision encoders are explored, with RoBERTa and Swin performing the best.
- Different fusion approaches are investigated, with merge-attention outperforming co-attention.
- MMSR with different SR architectures shows strong competitiveness, surpassing traditional ID-based SR methods.
Stats
Sequential Recommendation (SR) aims to predict future user-item interactions based on historical interactions.
Multi-modal information is leveraged without using IDs to construct a Multi-Modal Sequential Recommendation (MMSR) framework.
Existing multi-modal related SR methods are systematically summarized into four core components.
The study explores constructing MMSR from scratch, benefiting from existing multi-modal pre-training paradigms, and addressing common challenges like cold start and domain transferring.
Quotes
"Our framework can be found at: https://github.com/MMSR23/MMSR."