Core Concepts
BLAIR introduces pretrained sentence embedding models specialized for recommendation scenarios, bridging language and items to enhance retrieval and recommendation tasks.
Abstract
BLAIR presents a series of pretrained models designed to learn correlations between item metadata and natural language contexts. The models are evaluated across various tasks, showcasing strong text and item representation capacity. The introduction of the AMAZON REVIEWS 2023 dataset provides essential resources for future research in recommendation systems.
The paper discusses the importance of language in e-commerce platforms, highlighting the critical role it plays in tasks like product retrieval and recommendation. Early methods lacked rich semantics of natural language, leading to a growing interest in leveraging large language models for more language-heavy recommendation tasks.
To address challenges in integrating practical scales of items into existing large language models, BLAIR is introduced as a lightweight model specialized in connecting natural language with items. By pretraining on a new dataset comprising over 570 million reviews and 48 million items from 33 categories, BLAIR demonstrates strong generalization ability across multiple domains and tasks.
The architecture of BLAIR involves encoding sentences into embeddings using RoBERTa as the backbone model. Training objectives focus on optimizing pairs of natural language context and item metadata through supervised contrastive loss, enabling effective linking between items and natural language contexts mentioned in user reviews.
Experiments conducted on three tasks - sequential recommendation, conventional product search, and complex product search - show that BLAIR outperforms existing methods across different domains. The results highlight the effectiveness of BLAIR's text-based item representations for recommendations.
Stats
AMAZON REVIEWS 2023 comprises over 570 million reviews and 48 million items from 33 categories.
Empirical results demonstrate that BLAIR exhibits strong text and item representation capacity.
The training objective focuses on optimizing pairs of natural language context c and item metadata m.
The contrastive loss function is used to align sentence embeddings of context c and item metadata m.
The overall training objective balances contrastive loss with an auxiliary loss function LPT.
Experiments include sequential recommendation, conventional product search, and complex product search tasks.
Quotes
"BLAIR improves over existing methods across multiple domains."
"Text-based methods generally achieve better performance than ID-based methods."
"The sparse retrieval method BM25 performs well on conventional product search but poorly on Amazon-C4."