Stem-OB is a novel preprocessing technique that improves the generalization of visual imitation learning models by leveraging diffusion inversion to converge diverse observations into shared representations, enhancing robustness to visual perturbations without inference-time overhead.
ビジュアル模倣学習におけるバイマニュアル操作タスクのキーポイントベースの学習方法を提案する。
Visual imitation learning has made significant progress in unimanual tasks, but bimanual coordination remains a challenge. Bi-KVIL extends keypoints-based visual imitation learning to bimanual manipulation tasks, extracting Hybrid Master-Slave Relationships and task representations.