Core Concepts
OpenStreetView-5M is a large-scale, open-access dataset of over 5.1 million geotagged street view images covering 225 countries and territories, designed to serve as a robust benchmark for evaluating computer vision models on the task of global visual geolocation.
Abstract
The authors introduce OpenStreetView-5M, a new large-scale, open-access dataset for the task of global visual geolocation. The dataset contains over 5.1 million geotagged street view images covering 225 countries and territories, making it the largest open-access dataset for this task.
Key highlights:
The dataset is designed to address limitations of existing geolocation datasets, which often contain a significant portion of noisy and non-localizable images or are proprietary and expensive to access.
OpenStreetView-5M enforces a strict train/test separation, ensuring that no image in the test set is within a 1km radius of any image in the training set. This allows for evaluating the relevance of learned geographical features beyond mere memorization.
The dataset is associated with rich metadata, including administrative divisions (country, region, area, city), as well as land cover, climate, soil type, driving side, and distance to the sea.
The authors conduct an extensive benchmark of various state-of-the-art image encoders, spatial representations, and training strategies on the dataset, demonstrating its utility as a robust and reliable benchmark for computer vision models.
Stats
OpenStreetView-5M contains over 5.1 million high-quality, geotagged street view images.
The dataset covers 225 countries and territories, with a normalized entropy of 0.78 in the test set distribution across countries.
Manual inspection shows that 96.1% of the images in the dataset are localizable.
Quotes
"Determining the location of an image anywhere on Earth is a complex visual task, which makes it particularly relevant for evaluating computer vision algorithms."
"Despite this potential, few supervised approaches are trained and evaluated for the task of geolocation."