Core Concepts
CryoGEM, a novel physics-informed generative model, can generate authentic synthetic cryo-electron microscopy datasets to improve downstream tasks like particle picking and pose estimation, leading to better 3D reconstruction resolution.
Abstract
The paper introduces CryoGEM, a physics-informed generative model for cryo-electron microscopy (cryo-EM) image synthesis. Cryo-EM is a crucial technique for resolving near-atomic resolution 3D structures of proteins, but its performance is limited by the lack of high-quality annotated datasets for training.
CryoGEM consists of two key components:
- Physics-based simulation: It simulates the cryo-EM imaging process based on a virtual specimen containing multiple copies of the target molecule with randomly generated locations, orientations, and conformations. This introduces physical priors into the generation process.
- Contrastive noise translation: To generate realistic noises, CryoGEM leverages an unpaired noise translation via contrastive learning with a novel mask-guided sampling scheme. This ensures that the positive and negative samples are semantically distinct, improving the quality of the generated noisy images.
Extensive experiments on five diverse cryo-EM datasets show that CryoGEM outperforms state-of-the-art generative models in terms of visual quality, measured by the Fréchet Inception Distance (FID). Furthermore, the synthetic datasets generated by CryoGEM can significantly improve the performance of downstream tasks, such as particle picking and pose estimation, leading to a 22% better resolution of the final 3D reconstruction on average.
Stats
The number of particles N in the virtual specimen follows a normal distribution N(μN, σ2
N), where μN and σN are derived from the actual distribution of particle count in real micrographs.
The virtual specimen S(x, y, z) is a 3D density volume that contains multiple copies of the target molecule's coarse result with randomly generated locations, orientations, and conformations.
The 2D physics-based result Iphy(x, y) is obtained by applying the weak phase object approximation, ice gradient, and point spread function to the virtual specimen.
The synthetic image Isyn is generated by adding Gaussian random noise to the physical simulation Iphy.
Quotes
"To achieve highly controllable generation results, we introduced a simple yet highly controllable physical simulation process."
"To generate authentic noises, we adopt unpaired image-to-image translation to generate authentic noises."
"Guided by this mask, our selection process can choose positive and negative samples from locations with contrasting mask labels. This significantly improves the encoder's performance."