An Embodied Multi-Modal Agent (EMMA) is trained by imitating an LLM expert in a parallel TextWorld to efficiently complete tasks in a visual environment.