Automated Cleaning and Filtering of the QUILT-1M Pathology Dataset for Improved Text-Conditional Image Synthesis
The QUILT-1M dataset, a large-scale collection of pathology images and captions, contains significant impurities that can negatively impact its utility for text-conditional image synthesis tasks. An automated deep learning pipeline is proposed to detect and filter out these impurities, leading to substantial improvements in the quality of generated images.