核心概念
Our method automatically generates a large, realistic dataset of dynamic objects under occlusions using freely available time-lapse imagery, enabling efficient training of object reconstruction methods that are robust to occlusions.
摘要
The paper introduces a novel framework for automatically generating a large, realistic dataset of dynamic objects under occlusions using freely available time-lapse imagery. The key insights are:
-
Leveraging off-the-shelf 2D (bounding box, segmentation, keypoint) and 3D (pose, shape) predictions as pseudo-groundtruth, unoccluded 3D objects are identified automatically and composited into the background in a clip-art style, ensuring realistic appearances and physically accurate occlusion configurations.
-
The resulting clip-art image with pseudo-groundtruth enables efficient training of object reconstruction methods that are robust to occlusions. Experiments show significant improvements in both 2D and 3D reconstruction, particularly in scenarios with heavily occluded objects like vehicles and people in urban scenes.
-
The method does not require any human labeling and is easily scalable, serving as an effective approach to automatically generate realistic training data for reconstructing dynamic objects under occlusion.
统计
"Existing off-the-shelf methods demonstrate good accuracy in both 2D (segmentation, keypoints) and 3D (pose, shape) prediction tasks, especially on unoccluded objects."
"We start with the observation that, although not perfect, existing off-the-shelf methods demonstrate good accuracy in both 2D (segmentation [30], keypoints [64, 65]) and 3D (pose, shape [36, 52, 70]) prediction tasks, especially on unoccluded objects."
引用
"Expanding on WALT [59], we utilize time-lapse videos from stationary cameras to synthesize realistic occlusion scenarios by extracting unoccluded objects and composite them back into the background image at their original positions."
"Unlike WALT which focuses solely on compositing and learning 2D tasks, our approach extends to generating high-quality 3D pseudo-groundtruth data for robust 3D object reconstruction under occlusion."