核心概念
Leveraging the capabilities of Multimodal Large Language Models (MLLMs) to effectively integrate multimodal data and capture the dynamic evolution of user preferences, thereby enhancing the accuracy and interpretability of sequential recommendations.
要約
The paper introduces the Multimodal Large Language Model-enhanced Multimodal Sequential Recommendation (MLLM-MSR) framework, which aims to address the challenges of integrating multimodal data and modeling the temporal dynamics of user preferences in sequential recommendation systems.
Key highlights:
- Multimodal Item Summarization: The framework employs MLLMs to summarize the textual and visual information of items into a unified textual description, overcoming the limitations of MLLMs in processing multiple ordered image inputs.
- Recurrent User Preference Inference: A prompted sequence modeling approach is used to iteratively capture the dynamic evolution of user preferences, effectively managing the complexity of long multimodal sequences.
- Supervised Fine-Tuning of MLLM-based Recommender: The framework fine-tunes an open-source MLLM as the recommendation model, leveraging the enriched item data and inferred user preferences to enhance personalization and accuracy.
The extensive experiments conducted across diverse datasets demonstrate the superior performance of MLLM-MSR compared to various baseline methods, validating the effectiveness of the proposed approach in harnessing the capabilities of MLLMs to improve multimodal sequential recommendations.
統計
The average sequence length of user-item interactions ranges from 11.35 to 13.65 across the datasets.
The sparsity of the datasets is around 99.93-99.96%.