核心概念
BEV-CV introduces a novel multi-branch architecture that reduces the domain gap between ground-level and aerial images by extracting semantic features at multiple resolutions and projecting them into a shared representation space, enabling efficient cross-view geo-localization.
摘要
The paper proposes BEV-CV, a novel approach to cross-view geo-localization (CVGL) that aims to reduce the domain gap between ground-level (point-of-view, POV) and aerial images. The key contributions are:
- A multi-branch architecture that extracts semantic features at multiple resolutions from both POV and aerial images, and projects them into a shared representation space for matching.
- Adjustments to benchmark datasets to better represent real-world application scenarios, such as using limited field-of-view (FOV) and road-aligned POV images.
- A focus on improving computational efficiency, reducing query times by 18% and embedding database memory requirements by 33% compared to previous state-of-the-art methods.
The BEV-CV network consists of two main branches:
- The BEV Branch extracts features from the POV images and transforms them into a top-down birds-eye-view (BEV) representation using a multi-scale dense transform.
- The Aerial Branch uses a U-Net architecture to extract features from the aerial images.
The extracted features from both branches are then projected into a shared representation space and matched using a normalized temperature-scaled cross-entropy loss function.
Evaluation on the CVUSA and CVACT datasets shows that BEV-CV achieves state-of-the-art recall accuracies, improving Top-1 rates by 23% and 24% respectively for 70° FOV crops aligned to the vehicle's heading. The authors also demonstrate improved computational efficiency compared to previous works, reducing floating point operations by 6.5% and embedding dimensionality by 33%.
统计
"Cross-view image matching for geo-localisation is a challenging problem due to the significant visual difference between aerial and ground-level viewpoints."
"BEV-CV achieves state-of-the-art recall accuracies, improving Top-1 rates of 70°crops of CVUSA and CVACT by 23% and 24% respectively."
"BEV-CV decreases computational requirements by reducing floating point operations to below previous works, and decreasing embedding dimensionality by 33% - together allowing for faster localisation capabilities."
引用
"BEV-CV achieves state-of-the-art recall accuracies, improving Top-1 rates of 70°crops of CVUSA and CVACT by 23% and 24% respectively."
"BEV-CV decreases computational requirements by reducing floating point operations to below previous works, and decreasing embedding dimensionality by 33% - together allowing for faster localisation capabilities."