Conceptos Básicos
This work introduces FedVPR, the first formulation of Visual Place Recognition (VPR) in a federated learning framework, addressing key challenges such as the lack of well-defined classes and the need for computationally heavy mining over a centralized database.
Resumen
The paper presents FedVPR, a novel federated learning framework for Visual Place Recognition (VPR) tasks. VPR aims to estimate the location of an image by treating it as a retrieval problem, where a database of geo-tagged images is used to find the most similar matches.
The key contributions are:
- Introducing the first formulation of VPR in a federated learning framework, which opens up a new research direction with important practical implications.
- Proposing a new splitting of the Mapillary Street-Level-Sequences (MSLS) dataset into federated clients, designed to replicate realistic scenarios with varying degrees of statistical heterogeneity.
- Addressing the challenges of clients' data heterogeneity through critical design decisions such as client split, local iteration scheduling, and data augmentation, achieving centralized-level performances while accounting for power and computational requirements.
The paper first establishes centralized baselines for VPR, exploring different model architectures and pooling layers. It then analyzes the performance of the vanilla FedAvg algorithm across the proposed federated datasets, highlighting the impact of data quantity skewness and the importance of addressing it through techniques like FedVC.
Furthermore, the paper investigates the effect of heterogeneous data augmentation on federated training, demonstrating the severe performance degradation caused by client-specific color jitter. It also analyzes the impact of local mining, showing that a moderate geographical scope can be beneficial for VPR, in contrast to the traditional assumption that geographical diversity is essential.
Overall, the work introduces FedVPR as a new and challenging task for the federated learning research community, paving the way for future advancements in distributed visual place recognition.
Estadísticas
The number of sequences per client varies from 17 ± 18 to 75 ± 148, and the number of images per client ranges from 897 ± 808 to 4270 ± 6515, depending on the federated dataset split.
Citas
"VPR data inherently lacks well-defined classes, and models are typically trained using contrastive learning, which necessitates a data mining step on a centralized database."
"Unlike the conventional FL literature that revolves around classification problems, VPR lacks a clear division of data into classes."