Core Concepts
A trajectory planning method for autonomous vehicles using reinforcement learning that includes iterative reward prediction to stabilize the learning process and uncertainty propagation to account for uncertainties in perception, prediction, and control.
Abstract
The paper proposes a trajectory planning method for autonomous vehicles using reinforcement learning (RL) that addresses the limitations of traditional trajectory planning approaches and RL-based methods.
The key aspects of the proposed method are:
Reward Prediction (RP): The method uses RP to stabilize the learning process by predicting the expected future rewards instead of using the actual returns, which can have high variance.
Iterative Reward Prediction (IRP): To further improve the accuracy of reward prediction, the method uses IRP, which iteratively plans new trajectories at the predicted states and predicts the rewards of those trajectories.
Uncertainty Propagation: The method considers the uncertainties in the perception, prediction, and control modules by propagating the uncertainty of the predicted states of the ego vehicle and other traffic participants. This is done using a Kalman filter-based approach and Minkowski sum to check for collisions.
The proposed method is evaluated in the CARLA simulator across various scenarios, including lane following, lane changing, and navigating through static obstacles and other traffic participants. Compared to baseline methods, the proposed method with IRP and uncertainty propagation shows significant improvements in terms of reduced collision rate and increased average reward.
The key highlights and insights from the paper are:
Reinforcement learning-based trajectory planning can overcome the limitations of traditional heuristic and rule-based methods, but it suffers from instability and lack of consideration for uncertainties.
The iterative reward prediction and uncertainty propagation techniques help to stabilize the learning process and make the RL agent aware of the uncertainties, leading to safer and more robust trajectory planning.
The experimental results demonstrate the effectiveness of the proposed method in various driving scenarios, highlighting its potential for real-world autonomous driving applications.
Stats
The paper does not provide specific numerical data or metrics in the main text. However, the results are presented in the form of figures and a table comparing the performance of the proposed method and the baseline methods across different scenarios.
Quotes
The paper does not contain any direct quotes that are particularly striking or supportive of the key logics.