This review provides a comprehensive summary and in-depth discussion of methods for handling missing data, with a focus on special missing mechanisms in tabular data. The key highlights are:
Comprehensive Review of Special Missing Mechanisms: The review covers traditional techniques like deletion and imputation, as well as emerging methods based on representation learning. It emphasizes the importance of imputation-based approaches, as modern datasets are growing in size and complexity, making conventional statistical and machine learning-based approaches insufficient.
Thorough Examination of Missing Data Generation Methods: The review meticulously catalogs the different methods used to generate missing data, especially for the less frequently addressed MAR and MNAR mechanisms. This aims to raise awareness of the importance and variability of special missing mechanisms and encourage a more comprehensive exploration of these mechanisms in future studies.
Guidance for Future Research Directions: The review proposes future research directions to overcome the limitations of existing methods and promote the adoption of advanced techniques in practical settings. It identifies research gaps within the literature and suggests new applications for imputation schemes, serving as a roadmap for researchers and practitioners.
The review covers three broad categories of methods for handling missing data: Deletion, Imputation, and Representation Learning. Deletion methods, such as listwise and pairwise deletion, are straightforward but can lead to biased outcomes, especially when dealing with special missing mechanisms. Imputation methods aim to recover missing values while preserving the integrity of the complete dataset, with a focus on statistical-based, machine learning-based, and neural network-based approaches. Representation learning methods leverage the power of feature learning to improve the quality and accuracy of imputed values.
The review also discusses the importance of understanding missing data generation methods, particularly for special missing mechanisms like MAR and MNAR, which are less explored in the literature. It highlights the need for standardized approaches to generate missing data in different experiments, enabling meaningful comparisons between methods.
Overall, this comprehensive review serves as a valuable resource for researchers and practitioners in the field of missing data handling, providing insights into the latest techniques and guiding future research directions.
Іншою мовою
із вихідного контенту
arxiv.org
Ключові висновки, отримані з
by Youran Zhou,... о arxiv.org 04-09-2024
https://arxiv.org/pdf/2404.04905.pdfГлибші Запити