The paper proposes a novel semi-supervised learning framework called Spatiotemporal SSL that leverages spatiotemporal metadata to enhance the quality of pseudo-labels generated for unlabeled samples. The key idea is to train a teacher model that has access to the metadata and uses it to produce high-quality pseudo-labels, which are then used to train a student model that does not receive the metadata as input.
The paper makes the following key contributions:
It introduces a teacher-student architecture where the teacher model utilizes the spatiotemporal metadata to generate improved pseudo-labels, while the student model learns from these pseudo-labels without directly accessing the metadata.
It proposes an early-fusion approach to jointly model visual features and spatiotemporal information in the teacher model, allowing the model to capture the dependency between visual appearance and spatiotemporal context.
It introduces a novel distillation mechanism to further enhance the knowledge transfer from the teacher to the student model, where a dedicated distillation token in the student model is supervised to align with the spatiotemporal metatoken in the teacher model.
The authors demonstrate that Spatiotemporal SSL can be easily combined with several state-of-the-art semi-supervised learning methods, leading to consistent and significant performance improvements on the BigEarthNet and EuroSAT benchmarks.
The paper also provides a detailed analysis of the proposed approach, including ablation studies and experiments on the generalization of the models to out-of-distribution spatiotemporal contexts.
To Another Language
from source content
arxiv.org
Deeper Inquiries