Główne pojęcia
MOFI is a vision foundation model designed to learn image representations from noisy entity annotated images, achieving state-of-the-art performance.
Statystyki
MOFI achieves 86.66% mAP on the GPR1200 dataset.
The I2E dataset consists of 1 billion images and 2 million distinct entities.
Cytaty
"Through this method, we have created Image-to-Entities (I2E), a new dataset with 1 billion images and 2 million distinct entities."
"The final MOFI model achieves 86.66% mAP on the challenging GPR1200 dataset."