Core Concepts
Introducing DOCTR, a novel object-centric Transformer for unified learning with multiple objects in point scene understanding.
Abstract
Abstract:
Point scene understanding is challenging due to complex pipelines and lack of leveraging relationship constraints between objects.
DOCTR proposes a Disentangled Object-Centric Transformer for unified learning with multiple objects.
Introduction:
Importance of 3D scene understanding for applications like AR, autonomous driving, and robotics.
Addressing the task of point scene understanding involving various sub-tasks simultaneously.
Data Extraction:
"Code is available at https://github.com/SAITPublic/DOCTR."
Related Work:
Previous methods like RfD-Net and DIMR addressed object recognition and mesh reconstruction tasks.
Methods:
Description of the DOCTR pipeline including backbone, disentangled Transformer decoder, prediction head, and shape decoder.
Training Design:
Utilization of hybrid bipartite matching strategy during training to assign ground truths to queries.
Experiment:
Evaluation on ScanNet dataset showing superior performance compared to previous SOTA methods.
Acknowledgments:
Contributions from Hui Zhang and Yi Zhou acknowledged.
Stats
Code is available at https://github.com/SAITPublic/DOCTR.