Multimodal VAEs for Unsupervised Learning of Robotic Manipulation from Vision, Language, and Action
Multimodal Variational Autoencoders (VAEs) can effectively map and integrate visual, language, and action modalities to enable unsupervised learning of robotic manipulation tasks.