An Embodied Generalist Agent for Comprehensive 3D Scene Understanding and Interaction
LEO, an embodied multi-modal generalist agent, can perceive, ground, reason, plan, and act in the 3D world through a unified task interface, model architecture, and objective.