A simple and effective method, named Learning to Bootstrap (L2B), enables models to bootstrap themselves using their own predictions without being adversely affected by erroneous pseudo-labels by dynamically adjusting the importance weight between real observed and generated labels, as well as between different samples through meta-learning.


coremsg

learning-to-bootstrap-robust-models-for-combating-label-noise


Learning to Bootstrap Robust Models for Combating Label Noise


title_rewrite


The paper introduces JEP-KD, a joint-embedding predictive architecture that leverages a generative network within the embedding layer to enhance the video encoder's capacity for semantic feature extraction and better align it with audio features from a pre-trained ASR model. This approach aims to progressively reduce the performance gap between visual speech recognition (VSR) and automatic speech recognition (ASR).


jep-kd-a-joint-embedding-predictive-architecture-for-enhancing-visual-speech-recognition-through-knowledge-distillation


JEP-KD: A Joint-Embedding Predictive Architecture for Enhancing Visual Speech Recognition through Knowledge Distillation