The paper introduces a novel framework for automatically generating a large, realistic dataset of dynamic objects under occlusions using freely available time-lapse imagery. The key insights are:
Leveraging off-the-shelf 2D (bounding box, segmentation, keypoint) and 3D (pose, shape) predictions as pseudo-groundtruth, unoccluded 3D objects are identified automatically and composited into the background in a clip-art style, ensuring realistic appearances and physically accurate occlusion configurations.
The resulting clip-art image with pseudo-groundtruth enables efficient training of object reconstruction methods that are robust to occlusions. Experiments show significant improvements in both 2D and 3D reconstruction, particularly in scenarios with heavily occluded objects like vehicles and people in urban scenes.
The method does not require any human labeling and is easily scalable, serving as an effective approach to automatically generate realistic training data for reconstructing dynamic objects under occlusion.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Khiem Vuong,... at arxiv.org 03-29-2024
https://arxiv.org/pdf/2403.19022.pdfDeeper Inquiries