Temel Kavramlar
Efficient data valuation framework EcoVal accelerates the process by clustering similar data points and propagating value among them.
Özet
The content introduces EcoVal, an efficient data valuation framework for machine learning models. It addresses the challenges of traditional Shapley value-based methods by clustering similar data points and distributing value efficiently. The framework is validated through theoretical proof and empirical evaluation on MNIST, CIFAR10, and CIFAR100 datasets.
Abstract:
Quantifying data value crucial in ML workflow.
Existing Shapley frameworks computationally expensive.
Introduce EcoVal for fast data valuation.
Introduction:
Data valuation pivotal in ML and analytics.
Quality of data determines model effectiveness.
Motivation:
Existing Shapley frameworks computationally costly.
Inefficiencies lead to increased carbon footprint.
Our Contribution:
Two-step approach for cluster-level valuation.
Production function formulation for individual data value estimation.
Related Work:
Literature review of Shapley Value applications in economics and ML.
Preliminaries:
LOO Error and Shapley Value definitions explained.
Proposed Method:
Leave-cluster-out technique for cluster-level valuation.
Value propagation within a cluster using production functions.
Discussion: Comparison with Original Shapely:
Theoretical comparison between original Shapley and proposed method's error margin calculation provided.
İstatistikler
Shapley value based frame-
works require considerable amount of repeated training of the model to obtain the Shapley value.
Existing Data Shapley based frameworks suffer from high computational cost.
EcoVal performs clustering to reduce total number of data points during training phase.