Leveraging Pre-trained Latent Diffusion Models for Zero-Shot Medical Phrase Grounding
A zero-shot method for medical phrase grounding that leverages the cross-attention mechanisms within a pre-trained Latent Diffusion Model to extract heatmaps indicating regions of image-text alignment, without any further fine-tuning.