Core Concepts
A framework to generalize long-horizon extrinsic manipulation tasks from a single demonstration by retargeting the contact requirements to diverse object and environment configurations.
Abstract
The paper proposes a method to generalize long-horizon extrinsic manipulation tasks from a single demonstration. The key insights are:
- Long-horizon extrinsic manipulation can be decomposed into a sequence of short-horizon primitives based on contact switches.
- The success of each primitive is highly dependent on satisfying the desired contact configuration.
- By retargeting the contact requirements from the demonstration to the test scene, the same primitive sequence can be executed in diverse environments.
The approach involves:
- Preparing a library of short-horizon, goal-conditioned primitives that are robust to object and environment variations.
- Identifying the primitive sequence from the demonstration.
- Remapping the object states from the demonstration scene to the test scene while enforcing the contact requirements of each primitive.
- Combining the retargeted primitive sequence to achieve the manipulation objective.
The method was extensively validated on hardware, achieving an overall success rate of 80.5% across 4 long-horizon extrinsic manipulation tasks involving 10 objects and 6 environment configurations. Ablation studies showed that contact retargeting is the key to successfully chaining the extrinsic manipulation primitives.
Stats
The paper does not contain any explicit numerical data or statistics. The key results are the overall success rates on the hardware experiments.
Quotes
"Extrinsic manipulation, the use of environment contacts to achieve manipulation objectives, enables strategies that are otherwise impossible with a parallel jaw gripper."
"We observe that most extrinsic manipulation are combinations of short-horizon primitives, each of which depend strongly on initializing from a desirable contact configuration to succeed."
"By leveraging contact retargeting, our pipeline merely takes a single task demo of any primitive combination to achieve the same task in a distinct scene."