Hivert, B., Agniel, D., Thiébaut, R., & Hejblum, B. P. (2024). Running in circles: practical limitations for real-life application of data fission and data thinning in post-clustering differential analysis. arXiv preprint arXiv:2405.13591v2.
This paper investigates the practical limitations of data fission and data thinning methods in addressing the "double-dipping" issue in post-clustering differential analysis, particularly in the context of single-cell RNA sequencing (scRNA-seq) data.
The authors theoretically analyze the impact of biased variance estimation on the Type I error rate of the t-test in the context of data fission. They propose a heteroscedastic model with individual variances and employ a non-parametric local variance estimator to address the limitations of traditional methods. The performance of this approach is evaluated through simulations and application to a real-world scRNA-seq dataset.
The study concludes that data fission and data thinning, despite their initial promise, are practically limited in addressing post-clustering inference challenges, particularly in scenarios with unknown cluster structures and overlapping components. The authors emphasize the need for alternative methodologies that can effectively handle the complexities of real-world data.
This research highlights the limitations of popular data splitting techniques in post-clustering analysis, prompting further investigation into more robust and practical solutions for addressing the "double-dipping" issue.
The study primarily focuses on Gaussian and negative binomial distributions, warranting further exploration of these limitations in the context of other distributions commonly used in biological data analysis. Additionally, investigating alternative methodologies for parameter estimation and exploring strategies for improving the performance of local variance estimators in scenarios with overlapping clusters are promising avenues for future research.
To Another Language
from source content
arxiv.org
Deeper Inquiries