Conceitos Básicos
Combining basic driving imitation learning with Large Language Models (LLMs) based on multi-modality prompt tokens to enhance end-to-end autonomous driving performance.
Resumo
The paper proposes a novel framework that incorporates multi-modality perception inputs, including visual and LiDAR data, into joint token representations. These tokens are then used to prompt LLMs to generate driving descriptions and actions, rather than directly letting the LLMs drive.
The key highlights are:
- A two-stage fusion network that encodes visual and LiDAR inputs into joint multi-modal tokens.
- A prompt construction strategy that combines the multi-modal tokens, vehicle status, and driving task information to guide the LLM.
- A re-query mechanism that allows the system to re-evaluate the LLM's output if it conflicts with safety constraints.
- Incorporation of reward-guided reinforcement learning to further improve the model's waypoint prediction and control signal generation.
The experiments conducted on the CARLA simulator show that the proposed approach can achieve driving scores comparable to state-of-the-art end-to-end driving models, while also demonstrating the potential of leveraging LLMs to enhance autonomous driving capabilities.
Estatísticas
The car is driving , weather condition , there are currently <2> cars ahead, #obj1 is at <23 degrees>, distance <8m>, and #obj2 is at <30 degrees> and the distance is <8.5m>. Barrier ahead <N/A> Current driving speed, throttle <20%>, traffic light conditions <N/A> pedestrians <0>
The car is driving , weather condition , there are currently <4> cars ahead, #obj1 is at <-5 degrees>, distance <10m>, #obj2 is at <18 degrees> and the distance is <15.3m>, #obj3 is at <20 degrees> and the distance is <15.8m>, #obj4 is at <35 degrees> and the distance is <28.5m>. Barrier ahead <8m> Current driving speed, throttle <0%>, traffic light conditions pedestrians <0>
Citações
According to perception analysis, heading to waypoint in next four time steps <23.56, 78.24>, <28.36, 90.60>…, control action, steer <-15 degrees>, throttle < 15%> brake < 0%>
According to perception analysis, heading to waypoint in next four time steps <8.68, 12.34>, <12.51, 18.37>, <18.23, 25.17>,<22.05, 29.17>, control action, steer <-0.8 degrees>, throttle < 23%> brake < 0%>