Kernekoncepter
A novel hierarchical reinforcement learning framework that decouples the user preference modeling and the listwise item ranking, enabling personalized and temporal-aware recommendations.
Resumé
The paper proposes a hierarchical reinforcement learning (HRL) framework called mccHRL for listwise recommendation. The key ideas are:
-
Decoupling the user preference modeling and the listwise item ranking into two levels of the HRL framework:
- The High-Level Agent (HRA) models the user's long-term preference and the outra-session context (e.g., spatial-temporal information).
- The Low-Level Agent (LRA) focuses on the intra-session item selection, guided by the user preference encoded by the HRA.
-
Leveraging edge computing to improve the Markov assumption and sample efficiency:
- The LRA is deployed on the user's mobile device, utilizing on-device user features and providing low-latency interactions.
- The HRA is trained on the cloud side using the aggregated user data, with the learned user preference transmitted to the LRA.
-
Offline training and evaluation:
- A simulator-based environment is designed to mimic the mobile-cloud collaboration.
- Experiments are also conducted on a large-scale industrial dataset, demonstrating significant performance improvements over baselines.
The proposed mccHRL framework effectively addresses the challenges in listwise recommendation, such as the long-term user perception, short-term interest shifts, and the sparse feedback issue. By decoupling the user preference modeling and the listwise ranking, and leveraging the edge computing, mccHRL provides a unified and efficient solution for personalized and temporal-aware recommendations.
Statistik
The average rating of the Movielens dataset is 3.53.
The click-through rate (CTR) of the Alibaba dataset is around 5%.
Citater
"We argue that such framework has a well-defined decomposition of the outra-session context and the intra-session context, which are encoded by the high-level and low-level agents, respectively."
"We embrace this benefit to further improve the modeling depth of user states and enhance the Markov assumption. We argue that the low-level HRL could be deployed on mobile devices, therefore the on-device features can be involved, training is decoupled, and the communication frequency of cloud service is reduced during model inference."