Core Concepts
The author proposes a novel multimodal VAE, MM-VAMP VAE, with a data-dependent mixture-of-experts prior to improve representation learning and generative quality.
Abstract
The content discusses the challenges of multimodal representation learning and introduces the MM-VAMP VAE as a solution. It compares this new approach with existing methods on various datasets, showcasing superior performance in learned latent representations and generative quality.
- Multimodal VAEs aim to learn shared representations across different data modalities.
- The proposed MM-VAMP VAE uses a mixture-of-experts prior for improved latent representation.
- Experiments show that MM-VAMP outperforms baseline methods in terms of coherence and generative quality.
- The model is applied to real-world neuroscience data, demonstrating its effectiveness in capturing individual differences in brain activity.
Stats
In extensive experiments on multiple benchmark datasets and a challenging real-world neuroscience dataset, we show improved learned latent representations and imputation of missing data modalities compared to existing methods.
For the PolyMNIST dataset, we chose β ∈ {2−8, 23}.
For the bimodal CelebA dataset, we chose β ∈ {2−2, 22}.
For the hippocampal neural activities dataset, we trained the three VAE models on the neural data collected from five rats.
Quotes
"The proposed MM-VAMP VAE outperforms previous works regarding the coherence of generated samples while maintaining a low reconstruction error."
"Our approach enables us to describe underlying neural patterns shared across subjects while quantifying individual differences in brain activity and behavior."