This is a research paper that surveys and analyzes datasets used for advanced bankruptcy prediction.
Bibliographic Information: Wang, X., Brorsson, M., & Kr¨aussl, Z. (2024). Datasets for Advanced Bankruptcy Prediction: A survey and Taxonomy. Expert system with applications. Preprint submitted to Expert system with applications arXiv:2411.01928v1 [cs.CE] 4 Nov 2024
Research Objective: This paper aims to address the lack of focus on dataset quality in bankruptcy prediction research by providing a taxonomy of commonly used datasets, analyzing their characteristics, and proposing metrics to evaluate their quality and informativeness.
Methodology: The authors conducted a comprehensive literature review of bankruptcy prediction research using Google Scholar, focusing on papers published between 2013 and 2023 that utilized machine learning or deep learning methods. They identified 47 relevant papers and manually extracted information about the datasets used, leading to the development of a taxonomy categorized into five types: accounting-based, market-based, macroeconomic, relational, and non-financial. The authors then proposed metrics to evaluate the quality and informativeness of these datasets based on factors like data balance, volume, integrity, noise, distribution, and redundancy.
Key Findings: The study found that accounting-based data remains the most commonly used data source for bankruptcy prediction, but there is a growing trend of using mixed datasets. The authors also highlighted the challenges of data imbalance, limited sample sizes, and the lack of publicly available datasets in the field.
Main Conclusions: The authors argue that the quality and informativeness of datasets are crucial for building effective bankruptcy prediction models. They emphasize the need for researchers to carefully consider the characteristics of different datasets and utilize appropriate metrics for evaluation. The proposed taxonomy and evaluation metrics provide a framework for researchers to select and assess datasets, ultimately contributing to more reliable and robust bankruptcy prediction models.
Significance: This research contributes to the field of bankruptcy prediction by shifting the focus from model-centric approaches to a greater emphasis on data quality and informativeness. The proposed taxonomy and evaluation metrics offer valuable tools for researchers to navigate the landscape of bankruptcy prediction datasets and make informed decisions about data selection and utilization.
Limitations and Future Research: The study acknowledges the limitations of relying on publicly available datasets, which are often limited in scope and availability. Future research could explore the potential of alternative data sources, such as social media data or news articles, for bankruptcy prediction. Additionally, further investigation into the development of standardized data quality metrics and benchmarks for bankruptcy prediction datasets would be beneficial.
Egy másik nyelvre
a forrásanyagból
arxiv.org
Mélyebb kérdések