Grunnleggende konsepter
A novel positive-dominated negative synthesizing (PDNS) strategy that mitigates the over-fitting issue caused by false negatives in hard negative sampling for recommender systems.
Sammendrag
The paper focuses on the problem of over-fitting in hard negative sampling for implicit collaborative filtering (CF) tasks in recommender systems. The authors first study the reason behind the over-fitting, and attribute it to the incorrect selection of false negative instances during hard negative sampling. They then propose a novel negative sampling strategy called positive-dominated negative synthesizing (PDNS) to address this issue.
Key highlights:
- PDNS synthesizes hard negative instances by incorporating a large proportion of positive embeddings into negative embeddings, making the synthetic negatives dominated by positive information rather than negative information.
- Theoretical analysis reveals that PDNS is robust to false negative instances during hard negative sampling, as it can resist assigning very large gradient magnitudes to the hardest negatives.
- Comprehensive experiments on three real-world datasets demonstrate that PDNS not only largely mitigates the over-fitting issue, but also outperforms state-of-the-art negative sampling approaches in terms of both effectiveness and robustness.
- PDNS can be applied to various recommendation models, including GNN-based methods like LightGCN and matrix factorization (MF).
Statistikk
The harder the negatives selected, the more likely they are false ones, and the more severe the over-fitting. (Section 2.2)
Avoiding more false negatives during negative sampling can consistently mitigate the over-fitting of the recommender. (Section 4.2)
Sitater
"In implicit collaborative filtering (CF) task of recommender systems, recent works mainly focus on model structure design with promising techniques like graph neural networks (GNNs). Effective and efficient negative sampling methods that suit these models, however, remain underdeveloped."
"We suggest that the incorrect selection of false negatives contributes to the over-fitting that occurs in implicit CF when adopting hard negative sampling, and we verify it through simulation experiments."
"We demonstrate the advantages of PDNS over a set of state-of-the-art negative sampling approaches in terms of robustness and effectiveness by experimenting on three real-world datasets."