Grunnleggende konsepter
Incorporating the inductive bias of action locality into the design of a visuomotor policy framework can significantly boost sample efficiency in robotic manipulation tasks.
Sammendrag
The paper introduces SGRv2, a systematic framework for visuomotor policy learning that leverages the inductive bias of action locality to enhance sample efficiency in robotic manipulation.
Key highlights:
- SGRv2 builds upon the foundation of the Semantic-Geometric Representation (SGR) framework, but integrates action locality throughout its entire design.
- The core components of SGRv2's locality-aware design include:
- An encoder-decoder architecture for extracting point-wise features.
- A strategy for predicting the relative target position to ensure translation equivariance.
- The application of point-wise weights to highlight critical local regions.
- Dense supervision to enhance learning efficiency.
- Extensive experiments in both simulated and real-world settings demonstrate that SGRv2 significantly outperforms various baselines, including SGR, PointNeXt, R3M, PerAct, and RVT, especially in data-limited scenarios.
- SGRv2 exhibits exceptional sample efficiency, achieving remarkable results with as few as 5 demonstrations, compared to SGR's performance with 100 demonstrations.
- The authors also conduct ablation studies to validate the contributions of the key components of SGRv2's locality design.
- Real-world experiments with a Franka Emika Panda robot further confirm SGRv2's capability to complete complex long-horizon tasks and its ability to generalize.
Statistikk
The paper presents several key statistics to support the authors' claims:
With only 5 demonstrations, SGRv2 achieves an average success rate of 53.2% on 26 RLBench tasks, outperforming the most competitive baseline, RVT, by 1.32x.
On ManiSkill2 and MimicGen benchmarks, SGRv2 achieves a success rate that is 2.54 times higher than SGR when using dense control.
In real-world experiments with 8 demonstrations, SGRv2 outperforms PerAct and RVT, achieving an average success rate of 63% across 10 sub-tasks.
Sitater
"Central to the design of SGRv2 is the incorporation of a critical inductive bias—action locality, which posits that robot's actions are predominantly influenced by the target object and its interactions with the local environment."
"Extensive experiments in both simulated and real-world settings demonstrate that action locality is essential for boosting sample efficiency."
"SGRv2 excels in RLBench tasks with keyframe control using merely 5 demonstrations and surpasses the RVT baseline in 23 of 26 tasks."