The Let’s Go Shopping (LGS) dataset is a significant contribution to vision and vision-language applications, offering 15 million image-caption pairs from e-commerce websites. The dataset aims to address the limitations of existing datasets by providing clean, informative, and fluent data. Experiments demonstrate the unique characteristics of LGS images and captions, highlighting their potential for improving image classification, reconstruction, captioning, and text-to-image generation tasks.
Previous initiatives have faced challenges with noisy or subjective data sources like social media alt-texts. In contrast, LGS leverages e-commerce websites known for their cleanliness and informativeness. The dataset's focus on foreground objects with clear backgrounds sets it apart from general-domain datasets like ImageNet.
Experiments show that models trained on LGS outperform those trained solely on ImageNet in various tasks due to the distinct distribution of e-commerce data. Additionally, LGS serves as an effective pre-training dataset for downstream tasks in both general and fine-grained settings.
The study underscores the importance of domain-specific datasets like LGS in enhancing visual understanding through efficient data collection strategies tailored to specific applications.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Yatong Bai,U... at arxiv.org 03-07-2024
https://arxiv.org/pdf/2401.04575.pdfDeeper Inquiries