Conceptos Básicos
A novel deep multi-task approach that achieves state-of-the-art performance on a wide range of image geolocalization benchmarks while being robust to distribution shifts.
Resumen
The paper presents two models, PIGEON and PIGEOTTO, for planet-scale image geolocalization.
PIGEON:
- Trained on a dataset of 100,000 locations from the GeoGuessr game, with 4 panoramic images per location.
- Achieves impressive results on street-level geolocalization, placing 40.4% of guesses within 25km of the target.
- Outperforms top human players in the GeoGuessr game, including beating a world-class professional player.
PIGEOTTO:
- Trained on a diverse dataset of over 4 million images from Flickr and Wikipedia.
- Achieves state-of-the-art results on a range of benchmark datasets, significantly outperforming prior work.
- Exhibits robust behavior to distribution shifts, performing well even on datasets with predominantly unseen locations.
Key contributions:
- Semantic geocell creation that preserves geographic context.
- Multi-task contrastive pretraining of the CLIP model using synthetic captions with geographic, climate, and directional information.
- A novel loss function that incorporates haversine distance to capture the hierarchical nature of geographic information.
- A hierarchical retrieval mechanism for refining location predictions within and across geocells.
The paper also discusses the potential risks and ethical considerations around the development of accurate image geolocalization technologies.
Estadísticas
"Planet-scale image geolocalization remains a challenging problem due to the diversity of images originating from anywhere in the world."
"PIGEON is the first computer vision model to reliably beat the most experienced players in the game GeoGuessr, comfortably ranking within the top 0.01% of players."
"PIGEOTTO achieves state-of-the-art results across a wide range of benchmark datasets, including IM2GPS, IM2GPS3k, YFCC4k, YFCC26k, and GWS15k."
"PIGEOTTO is the first model that is robust to location and image distribution shifts by picking up general locational cues in images as evidenced by the often double-digit percentage-point increase in performance on larger evaluation radii."
Citas
"PIGEON is the first computer vision model to reliably beat the most experienced players in the game GeoGuessr, comfortably ranking within the top 0.01% of players."
"PIGEOTTO is the first model that is robust to location and image distribution shifts by picking up general locational cues in images as evidenced by the often double-digit percentage-point increase in performance on larger evaluation radii."