Core Concepts
Comparison of imputation techniques on healthcare datasets reveals Missforest as the best performer.
Abstract
The study compares seven imputation techniques on healthcare datasets, introducing missing values to evaluate performance. Missforest and MICE excel, suggesting imputing before feature selection is optimal. Results show RMSE and MAE comparisons across datasets, highlighting Missforest's superiority. Feature selection methods and evaluation metrics are discussed comprehensively.
Abstract
Missing data challenges in healthcare datasets.
Comparison of seven imputation techniques.
Introduction
Real-life datasets often contain missing values.
Types of missingness and reasons for missing values.
Datasets
Breast Cancer, Diabetes Mellitus, Heart Disease datasets described.
Missing Data Imputation Techniques
Mean, Median, LOCF, KNN, Interpolation, Missforest, MICE methods explained.
Feature Selection
Importance of feature selection in machine learning models.
Evaluation Metrics
RMSE, MAE, Recall, Precision, F1-Score, Accuracy definitions provided.
Results and Discussion
Performance comparison of imputation methods on different datasets.
Conclusion
Summary of findings regarding the best performing imputation methods and the optimal sequence for feature selection.
Stats
Some percentage of missing values - 10%, 15%, 20% and 25% were introduced into the dataset.
Quotes
"Missforest imputation performs the best followed by MICE imputation."
"Due to few literature on this subject among researchers..."