QUAR-VLA: A Vision-Language-Action Model for Enhancing Quadruped Robot Capabilities
The core message of this paper is to propose a novel paradigm called QUAR-VLA that tightly integrates visual information and natural language instructions to generate executable actions, effectively merging perception, planning, and decision-making to elevate the overall intelligence of quadruped robots.