核心概念
Combining basic driving imitation learning with Large Language Models (LLMs) based on multi-modality prompt tokens to enhance end-to-end autonomous driving performance.
摘要
The paper proposes a novel framework that incorporates multi-modality perception inputs, including visual and LiDAR data, into joint token representations. These tokens are then used to prompt LLMs to generate driving descriptions and actions, rather than directly letting the LLMs drive.
The key highlights are:
- A two-stage fusion network that encodes visual and LiDAR inputs into joint multi-modal tokens.
- A prompt construction strategy that combines the multi-modal tokens, vehicle status, and driving task information to guide the LLM.
- A re-query mechanism that allows the system to re-evaluate the LLM's output if it conflicts with safety constraints.
- Incorporation of reward-guided reinforcement learning to further improve the model's waypoint prediction and control signal generation.
The experiments conducted on the CARLA simulator show that the proposed approach can achieve driving scores comparable to state-of-the-art end-to-end driving models, while also demonstrating the potential of leveraging LLMs to enhance autonomous driving capabilities.
統計資料
The car is driving , weather condition , there are currently <2> cars ahead, #obj1 is at <23 degrees>, distance <8m>, and #obj2 is at <30 degrees> and the distance is <8.5m>. Barrier ahead <N/A> Current driving speed, throttle <20%>, traffic light conditions <N/A> pedestrians <0>
The car is driving , weather condition , there are currently <4> cars ahead, #obj1 is at <-5 degrees>, distance <10m>, #obj2 is at <18 degrees> and the distance is <15.3m>, #obj3 is at <20 degrees> and the distance is <15.8m>, #obj4 is at <35 degrees> and the distance is <28.5m>. Barrier ahead <8m> Current driving speed, throttle <0%>, traffic light conditions pedestrians <0>
引述
According to perception analysis, heading to waypoint in next four time steps <23.56, 78.24>, <28.36, 90.60>…, control action, steer <-15 degrees>, throttle < 15%> brake < 0%>
According to perception analysis, heading to waypoint in next four time steps <8.68, 12.34>, <12.51, 18.37>, <18.23, 25.17>,<22.05, 29.17>, control action, steer <-0.8 degrees>, throttle < 23%> brake < 0%>