인터넷 규모의 시맨틱 지식을 가진 사전 훈련된 비전-언어 모델(VLM)을 기반으로 하고 플로우 매칭 아키텍처를 통해 연속적인 행동을 생성하는 새로운 로봇 제어 모델 π0를 제시하며, 대규모의 다양한 로봇 데이터를 통해 덱스트러스하고 복잡한 조작 작업을 수행하는 방법을 제시합니다.
This paper introduces π0, a novel vision-language-action (VLA) model for general robot control that leverages flow matching for action generation and benefits from large-scale pre-training on diverse robot manipulation data, achieving state-of-the-art performance on complex, dexterous tasks.
This paper introduces Sparse Diffusion Policy (SDP), a novel approach that enhances robot learning efficiency in multitask, continual, and transfer learning scenarios by integrating Mixture of Experts (MoE) within a transformer-based diffusion policy.
GR-2, a novel AI model, demonstrates impressive capabilities in robot manipulation by leveraging large-scale pre-training on internet videos, enabling it to generalize across a wide range of tasks and environments.