The uaMix-MAE strategy introduces an efficient ID tuning approach that leverages unsupervised audio mixtures to align representations of pretrained MAEs, facilitating adaptation to task-specific semantics. By combining Instance Discrimination (ID) and Masked Autoencoders (MAEs), uaMix-MAE addresses the challenge of downstream tasks with constrained labeled data. The method optimizes the model using contrastive tuning and proposes an audio mixing technique to manipulate audio samples in both input and virtual label spaces. Experimental results show that uaMix-MAE achieves significant accuracy improvements over various benchmarks in low/few-shot scenarios.
Til et annet språk
fra kildeinnhold
arxiv.org
Viktige innsikter hentet fra
by Afrina Tabas... klokken arxiv.org 03-15-2024
https://arxiv.org/pdf/2403.09579.pdfDypere Spørsmål