Enhancing Image Captioning with Pyramid of Captions: Leveraging Local and Global Visual Cues for Informative and Coherent Descriptions
The Pyramid of Captions (PoCa) method leverages a hierarchical approach to generate detailed and informative image captions by fusing local and global visual information using large language models.