Kernekoncepter
Federated XGBoost using Minimal Variance Sampling (MVS) can improve performance in terms of accuracy and regression error compared to federated XGBoost with no sampling or uniform sampling on federated tabular datasets.
Resumé
The paper proposes a federated XGBoost model that uses Minimal Variance Sampling (MVS) to select training data when building decision trees. The authors evaluate this model, called F-XGB, on a set of federated tabular datasets and compare its performance to federated XGBoost with no sampling (NS) and uniform sampling (U).
The key findings are:
F-XGB using MVS outperforms F-XGB with NS and U in almost all cases, achieving better accuracy, F1 scores, AUC, and lower regression error.
F-XGB using MVS with a 50% sampling fraction performs best on larger and multiclass datasets, while 10-20% sampling fraction works better for smaller and binary classification datasets.
F-XGB using MVS outperforms centralized XGBoost in half of the studied cases.
F-XGB using MVS can improve local performance on client datasets compared to global performance, indicating it can better optimize for local data distributions.
The authors introduce "FedTab", a collection of federated tabular datasets for benchmarking federated learning methods.
Overall, the results demonstrate that incorporating sampling techniques like MVS can significantly enhance the performance of federated XGBoost on tabular data, outperforming both federated XGBoost without sampling and centralized XGBoost in many cases.
Statistik
Federated XGBoost using MVS with 50% sampling fraction achieves 93.5% accuracy on FEMNIST dataset, compared to 89.9% for uniform sampling and 89.7% for no sampling.
On the Insurance Premium Prediction regression task, Federated XGBoost using MVS with 20% sampling fraction achieves an RMSE of 4082, compared to 4310 for uniform sampling and 4496 for no sampling.
Federated XGBoost using MVS outperforms centralized XGBoost on 3 out of the 6 datasets studied.
Citater
"Federated XGBoost using MVS improves performance in terms of accuracy and regression error when compared with federated XGBoost using no- or uniform sampling."
"Federated XGBoost using MVS performs similarly as centralized, and even outperforms it in half of the cases."