Belangrijkste concepten
This paper proposes CaDRec, a contextualized and debiased recommender model that effectively mitigates the over-smoothing issue in graph convolution networks (GCNs) and tackles the skewed distribution of user-item interactions caused by popularity and user-individual biases.
Samenvatting
The paper introduces the CaDRec framework, which consists of two main components:
Contextualized Representation Learning:
- To address the over-smoothing issue in GCNs, CaDRec proposes a novel hypergraph convolution (HGC) operator that considers both structural and sequential contexts during message propagation.
- It integrates the self-attention (SA) correlation as a trainable perturbation on the edges, allowing the HGC to select effective neighbors and capture sequential dependencies.
Debiased Representation Learning:
- To overcome the skewed distribution of user-item interactions, CaDRec introduces two debiasing strategies:
- Modeling user individual bias as a learnable perturbation on item representations to disentangle them from user biases.
- Encoding item popularity through positional encoding, which is plug-and-play and interpretable, to ensure that items with similar popularity are closer in the embedding space.
- CaDRec also addresses the imbalance of gradients to update item embeddings, which can exacerbate the popularity bias, by adopting regularization and weighting schemes.
Extensive experiments on four real-world datasets demonstrate that CaDRec outperforms state-of-the-art recommendation methods in terms of Recall@K and NDCG@K.
Statistieken
The average count of users' interactions in the Yelp2018 dataset is 49.3.
The density of the Yelp2018 dataset is 0.13%.
The average count of users' interactions in the Foursquare dataset is 68.2.
The density of the Foursquare dataset is 0.24%.
The average count of users' interactions in the Douban-book dataset is 46.5.
The density of the Douban-book dataset is 0.21%.
The average count of users' interactions in the ML-1M dataset is 95.3.
The density of the ML-1M dataset is 2.7%.