The paper introduces a decision-aware dataset distance measure based on Optimal Transport (OT) techniques that incorporates features, labels, and decisions. This is the first approach to integrate decisions as part of the dataset distance, addressing the unique challenges of PtO tasks.
The key highlights and insights are:
Traditional dataset distances, which rely solely on feature and label dimensions, lack informativeness in the PtO context where model performance is measured through decision regret minimization rather than prediction error minimization.
The proposed decision-aware dataset distance effectively captures adaptation success in PtO contexts by incorporating the impacts of downstream decisions. It provides a PtO adaptation bound in terms of this decision-aware dataset distance.
Empirical analysis across three different PtO tasks from the literature - Linear Model Top-K, Warcraft Shortest Path, and Inventory Stock Problem - demonstrates that the decision-aware distance better predicts transferability compared to feature-label distances alone.
The flexibility to weight the feature, label, and decision components in the ground cost function allows the distance metric to be tailored to the specific requirements of each PtO task.
The impact of target shift, where the target label distribution changes while the feature distribution remains constant, is less pronounced in PtO contexts compared to standard supervised learning. The decision-aware dataset distance effectively captures this behavior.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Paula Rodrig... at arxiv.org 09-12-2024
https://arxiv.org/pdf/2409.06997.pdfDeeper Inquiries