본 논문에서는 대규모 멀티모달 상황 인식 데이터셋인 MSQA를 제시하고, 이를 활용하여 3D 장면에서의 상황 인식 및 행동 예측 능력을 평가하는 벤치마크를 소개합니다.
This paper introduces ALOcc, a novel convolutional architecture that achieves state-of-the-art speed and accuracy in predicting 3D semantic occupancy and flow from surround-view camera data, addressing key challenges in 2D-to-3D view transformation and multi-task feature encoding for autonomous driving applications.
DistillNeRF는 제한된 2D 이미지 입력에서 풍부한 3D 장면 표현을 예측하기 위해 오프라인 NeRF 및 시각적 기반 모델에서 지식을 추출하는 자기 지도 학습 프레임워크입니다.
This paper introduces SCRREAM, a novel framework for creating highly accurate and dense 3D annotations of indoor scenes, addressing limitations in existing datasets that prioritize scale over detailed geometry.
This paper introduces a novel, learning-free method for uplifting 2D visual features to 3D Gaussian Splatting scenes, enabling efficient and effective semantic segmentation in 3D scenes without requiring computationally expensive optimization procedures.
This paper introduces a novel method for improving 3D scene understanding by using a two-stage masked token prediction framework guided by SAM (Segment Anything Model) to transfer knowledge from pre-trained 2D foundation models to 3D models.
This paper proposes a novel method for 3D scene understanding that leverages 2D panoptic segmentation information within a neural radiance field framework, guided by perceptual priors, to achieve accurate and consistent 3D panoptic segmentation.
提案手法BBQは、3Dシーングラフ表現と大規模言語モデルを組み合わせることで、複雑な自然言語クエリに基づいて3Dシーン内のオブジェクトを正確に特定することができる。
Advancing open-vocabulary 3D scene understanding through innovative methods and challenges.