Mask4Former directly predicts semantic instance masks and their temporal associations in a unified model, eliminating the need for non-learned clustering strategies.
Leveraging asynchronous historical LiDAR data from past traversals can significantly improve the performance of monocular 3D object detectors.
HawkDrive is a novel perception system that combines hardware solutions, including a stereo camera setup and an Nvidia Jetson Xavier AGX edge computing device, with transformer-based neural networks for low-light enhancement, depth estimation, and semantic segmentation to enable robust autonomous driving in nighttime conditions.
The authors present a novel multi-modal 3D semantic occupancy prediction framework, Co-Occ, which couples explicit LiDAR-camera feature fusion with implicit volume rendering regularization to effectively leverage the complementary strengths of LiDAR and camera data.
The core message of this paper is to present the first approach for 3D open-vocabulary panoptic segmentation in autonomous driving by leveraging large vision-language models and proposing novel loss functions for effective learning.
A two-stage method is proposed to decompose the traditional end-to-end bird's eye view semantic segmentation task into a BEV autoencoder for generation and an RGB-BEV alignment module for perception, which simplifies the complexity and improves the performance.
Existing multi-modal 3D object detection algorithms exhibit varying degrees of robustness depending on their specific fusion, alignment, and training strategies when faced with diverse sensor corruptions.
The proposed CRKD framework enables effective knowledge distillation from a high-performing LiDAR-camera teacher detector to a camera-radar student detector, bridging the performance gap between the two sensor configurations.