Incorporating radius-normalized distance and directional vectors as additional local neighborhood features can significantly improve the classification accuracy of 3D point cloud models, particularly on real-world datasets.
A novel transformer-based approach that reconstructs a high-resolution 3D point cloud from a single image and a small set of 3D points, enabling accurate 3D object detection with limited sensor data.
The proposed method leverages 3D geometry-aware features extracted from 3D Gaussian distributions to enable improved modeling of complex 3D deformations, leading to enhanced dynamic view synthesis and 3D dynamic reconstruction.
The proposed hierarchical multi-label classification (HMC) training strategy enables 3D LiDAR semantic segmentation models to learn structural relationships between classes, allowing them to provide confident high-level information and well-calibrated detailed classifications in uncertain situations.
SparseOcc, the first fully sparse occupancy network, achieves state-of-the-art performance on the Occ3D-nuScenes benchmark while maintaining real-time inference speed by exploiting the inherent sparsity of 3D scenes.
SANeRF-HQ leverages the Segment Anything Model (SAM) and Neural Radiance Fields (NeRF) to achieve high-quality 3D segmentation of any target object in a given scene, producing accurate segmentation boundaries and consistent multi-view results.
GS-SLAM utilizes a 3D Gaussian scene representation coupled with a real-time differentiable splatting rendering pipeline to achieve a better balance between efficiency and accuracy in dense visual SLAM.
SAOR, a novel self-supervised approach, can estimate the 3D shape, texture, and viewpoint of an articulated object from a single image without requiring any category-specific 3D templates or skeletons.
The authors propose efficient non-parametric and parametric frameworks, Seg-NN and Seg-PN, for few-shot 3D scene segmentation. Seg-NN is a training-free encoder that can extract discriminative representations without any learnable parameters, while Seg-PN further improves performance with a lightweight query-support transferring module.
Introducing S2TPVFormer, a unified spatiotemporal transformer architecture that leverages temporal cues to generate temporally coherent 3D semantic occupancy embeddings, outperforming the state-of-the-art TPVFormer by a significant 4.1% improvement in mean Intersection over Union (mIoU).