Temel Kavramlar
Federated learning can leverage datasets from multiple sources to scale up biomedical vision-language pre-training, but data heterogeneity across clients can significantly degrade the performance. The proposed FedRGB framework introduces a robust guidance-based local training scheme and a distribution-based min-max optimization to learn unbiased cross-modal alignment, effectively mitigating the impact of data heterogeneity.
Özet
The paper addresses the challenge of data heterogeneity in federated biomedical vision-language pre-training (VLP). Conventional federated learning approaches that simply average client models trained on heterogeneous local datasets can lead to biased cross-modal alignment and distorted feature representations.
To overcome this issue, the authors propose the FedRGB framework with two key components:
Guidance-based local training: FedRGB introduces a teacher alignment module to provide unbiased cross-modal alignment as guidance during local client training. This helps reduce the distortion on feature encoders caused by fitting heterogeneous local datasets.
Distributionally robust optimization (DRO) for cross-modal alignment: FedRGB employs a DRO-based algorithm to learn a robust teacher alignment module that performs well on the worst-case local data distribution, ensuring unbiased cross-modal alignment.
The experiments on real-world biomedical datasets show that FedRGB successfully promotes efficient federated multimodal learning by mitigating the impact of data heterogeneity. Compared to federated baselines, FedRGB achieves better performance on various downstream tasks, including image-text retrieval, classification, and segmentation. The analysis further demonstrates the robustness and transferability of the FedRGB pre-trained model.
İstatistikler
The paper does not provide specific numerical data or statistics in the main text. The key findings are presented through empirical analysis and comparisons of downstream task performance.
Alıntılar
The paper does not contain any striking quotes that directly support the key logics.