Stacking Joint Embedding Architectures (JEA) in a hierarchical manner enables self-supervised learning of separable and interpretable visual representations that capture hierarchical semantic concepts, leading to improved performance in downstream tasks.
Conditioning the encoders in image-based Joint-Embedding Predictive Architecture (IJEPA) with spatial information about the context and target windows improves representation learning, leading to better performance on image classification benchmarks, increased robustness to context window size, and improved sample efficiency during pretraining.
本文提出了一種簡單且有效的方法,通過將噪聲注入深度資訊最大化 (DIM) 模型,來實現學習表徵與特定先驗分佈(例如高斯或均勻分佈)的自動匹配。
Injecting noise into the normalized outputs of a deep neural network encoder during Deep InfoMax training enables automatic matching of learned representations to a selected prior distribution, offering a simple and effective approach for distribution matching in representation learning.
This research proposes a novel self-supervised learning method that enables logic operations (AND, OR, NOT) between image representations by leveraging probabilistic many-valued logic to represent the degree of feature possession within each image.
ViC-MAE 模型結合遮罩自動編碼器 (MAE) 和對比學習,透過將短視頻視為時間增強,從圖像和視頻中學習有效的視覺表示,並在圖像和視頻分類任務中展現出優異的遷移學習性能。
ViC-MAE는 마스크 이미지 모델링과 대조 학습을 결합하여 이미지와 짧은 비디오에서 효과적인 시각적 표현을 학습하는 자기 지도 학습 모델로, 이미지와 비디오 분류 작업 모두에서 우수한 성능을 달성했습니다.
ViC-MAE, a novel self-supervised model, effectively learns visual representations from both images and videos by combining contrastive learning and masked image modeling, achieving state-of-the-art performance in video-to-image transfer learning and demonstrating strong results across various image and video classification benchmarks.
The core message of this work is to propose the Multi-View Entropy Bottleneck (MVEB) objective to effectively learn the minimal sufficient representation in the unsupervised multi-view setting. MVEB simplifies the learning of the minimal sufficient representation to maximizing both the agreement between the embeddings of two views and the differential entropy of the embedding distribution.
MLVICXは胸部X線画像の自己教師付き表現学習のための多レベル分散共分散探索を紹介します。