Zheng, X., Chang, X., Jia, R., & Tan, Y. (2024). Towards Data Valuation via Asymmetric Data Shapley. arXiv preprint arXiv:2411.00388.
This paper addresses the limitations of traditional data Shapley in capturing the value of data points within structured datasets. The authors propose a novel framework called "asymmetric data Shapley" to provide a more accurate and fair data valuation method for machine learning applications.
The authors leverage the concept of asymmetric Shapley value from cooperative game theory and adapt it for data valuation in supervised machine learning. They introduce the concept of "weight systems" to incorporate inherent structures within datasets, allowing for differential weighting of data points based on their relationships. The paper presents a mathematical formulation for asymmetric data Shapley under general weight systems and proposes a specific type - "intra-class uniform weight system" (ICU-WS) - tailored for data valuation tasks. Furthermore, the authors develop two efficient algorithms for approximating and accurately computing asymmetric data Shapley: a Monte Carlo approach and a KNN surrogate method.
Asymmetric data Shapley offers a more accurate and equitable approach to data valuation in machine learning compared to traditional data Shapley. By incorporating inherent data structures, the framework allows for a nuanced understanding of individual data point contributions, leading to fairer compensation in data markets and improved data augmentation strategies.
This research significantly contributes to the field of data valuation by addressing the limitations of existing methods in capturing the value of structured data. The proposed asymmetric data Shapley framework has the potential to improve fairness and transparency in data-driven applications, particularly in data markets and algorithmic decision-making.
The current work primarily focuses on ICU-WS, leaving room for exploring the application and computational efficiency of asymmetric data Shapley under general weight systems. Further research could investigate the statistical properties of asymmetric data Shapley and its robustness to different data distributions.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Xi Zheng, Xi... at arxiv.org 11-04-2024
https://arxiv.org/pdf/2411.00388.pdfDeeper Inquiries