The paper presents an end-to-end multi-task model, named A-YOLOM, designed for real-time autonomous driving perception. The key highlights are:
Adaptive Concatenation Module: The model introduces an adaptive concatenation module in the segmentation neck, which can adaptively determine whether to concatenate features without manual design. This enhances the model's generalization capabilities.
Lightweight Segmentation Head: The segmentation head is designed to be lightweight, comprising only a series of convolutional layers. This reduces the inference time while maintaining competitive performance.
Unified Loss Function: The model uses the same loss function for tasks of the same type (detection or segmentation), further improving its generality.
Competitive Performance: The model achieves a mAP50 of 81.1% for object detection, a mIoU of 91.0% for drivable area segmentation, and an IoU of 28.8% for lane line segmentation on the BDD100K dataset. It also outperforms state-of-the-art methods in real-world scenarios.
Real-Time Inference: The lightweight design of the model enables real-time inference, with a maximum FPS of 172.2 on a GTX 1080 Ti GPU, making it suitable for deployment on edge devices.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Jiayuan Wang... at arxiv.org 04-26-2024
https://arxiv.org/pdf/2310.01641.pdfDeeper Inquiries