TOD3Cap: A Large-Scale Dataset and Model for 3D Dense Captioning in Outdoor Scenes
The core message of this work is to introduce the new task of outdoor 3D dense captioning, which aims to localize and describe all objects in a 3D outdoor scene using natural language. To facilitate research in this area, the authors propose the TOD3Cap dataset, the largest to date for 3D dense captioning in outdoor scenes, and develop the TOD3Cap network, a transformer-based architecture that effectively addresses the unique challenges of outdoor 3D dense captioning.