Thermal imagery can enable robust visual SLAM in challenging environments, but requires overcoming significant challenges in feature extraction and place recognition due to dramatic appearance changes over time. This work presents a SLAM system that leverages learned feature descriptors and a specialized bag-of-words vocabulary to achieve reliable long-term localization and mapping using thermal cameras.
GraspXL is a reinforcement learning framework that can synthesize diverse grasping motions for a wide range of unseen objects while adhering to specific motion objectives such as graspable areas, heading directions, wrist rotations, and hand positions.
OAKINK2 introduces a three-level abstraction framework - Affordance, Primitive Task, and Complex Task - to structure and understand complex bimanual object manipulation tasks. The dataset provides diverse human demonstrations and annotations to support applications such as interaction reconstruction and motion synthesis.
A generative modeling approach using diffusion models to predict a distribution of future trajectories of a person, conditioned on the egocentric observation of the surrounding environment and the person's past walking trajectory.