LLP-Bench is introduced as a comprehensive tabular benchmark with 70 datasets derived from Criteo CTR and SSCL. The paper proposes metrics to assess dataset difficulty and evaluates 9 SOTA techniques on these datasets. Notably, the performance of baselines varies across different dataset characteristics such as bag size, label variation, and bag separation.
The task of Learning from Label Proportions (LLP) is crucial in privacy-sensitive applications like online advertising and medical records anonymization. The proposed LLP-Bench addresses the need for a large-scale tabular benchmark by providing diverse datasets created from real-world data sources like Criteo CTR and SSCL.
The analysis reveals that certain datasets perform unexpectedly based on traditional metrics like MeanBagSize, LabelPropStdev, and InterIntraRatio. This highlights the importance of evaluating techniques on diverse datasets to understand their performance under varying conditions.
Overall, LLP-Bench serves as a valuable resource for researchers to study and develop new LLP techniques in the context of tabular data.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Anand Brahmb... at arxiv.org 03-06-2024
https://arxiv.org/pdf/2310.10096.pdfDeeper Inquiries