The paper proposes an end-to-end framework for lane change classification and prediction using two approaches that leverage state-of-the-art 3D action recognition networks.
The first approach, RGB+3DN, utilizes only the original RGB video data collected by cameras, mimicking how human drivers predict lane changes using visual cues. This method achieves state-of-the-art classification accuracy of 84.79% on the PREVENTION dataset using the lightweight X3D-S model.
The second approach, RGB+BB+3DN, incorporates vehicle bounding box information into the RGB video data to further improve performance. This method achieves very high classification and prediction accuracies of over 98% by leveraging the spatio-temporal feature extraction capabilities of 3D CNNs.
The authors also investigate the spatial and temporal attention regions of the 3D models using class activation maps, demonstrating that the models focus on the key visual cues like the target vehicle's motion and the lane markings. Furthermore, they propose optimizing the temporal kernel size to better extract relevant motion information, leading to improved accuracy.
The results show that action recognition models can efficiently process visual data to accurately anticipate lane change maneuvers of surrounding vehicles, a crucial capability for autonomous driving systems.
Til et annet språk
fra kildeinnhold
arxiv.org
Viktige innsikter hentet fra
by Kai Liang,Ju... klokken arxiv.org 04-11-2024
https://arxiv.org/pdf/2208.11650.pdfDypere Spørsmål