The article investigates the relationships between hyperparameters of machine learning (ML) algorithms and the fairness of the resulting models. The authors focus on group fairness notions, specifically the average odd difference (AOD), and explore the hyperparameter space of 5 popular ML training algorithms (Decision Tree, Logistic Regression, Support Vector Machine, Random Forest, and Discriminant Analysis) across 4 fairness-sensitive datasets (Adult Census, Compas Recidivism, German Credit, and Bank Marketing).
The authors first use an evolutionary search algorithm to generate a dataset of hyperparameter configurations and their corresponding fairness (AOD) values. They then train four different ML regression models (Deep Neural Network, Support Vector Regressor, Tree Regressor, and XGBoost) to learn a function that can predict the fairness of hyperparameter configurations.
The results show that Tree Regressor and XGBoost significantly outperform Deep Neural Networks and Support Vector Regressors in accurately predicting the fairness of hyperparameters, with 40% of the cases achieving an R^2 score of 0.95 or higher. However, the precision of the predictions depends on the ML training algorithm, dataset, and protected attribute.
Under temporal distribution shifts (e.g., training on 2014 data and predicting for 2015), the Tree Regressor and XGBoost models maintain reasonable accuracy in 20% of the benchmarks, particularly for the hyperparameters of Logistic Regression and Discriminant Analysis with sex as the protected attribute. The precision is significantly degraded for other training algorithms and protected attributes like race.
The authors conclude that their approach provides a sound framework to systematically examine the influence of hyperparameters on fairness and can help reduce the cost of training fair data-driven software solutions by avoiding biased configurations and leveraging promising hyperparameters. They also highlight the challenges in making such predictions in general and point out the circumstances for successful usage and future research directions.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Salvador Rob... at arxiv.org 05-01-2024
https://arxiv.org/pdf/2404.19100.pdfDeeper Inquiries