Multi-modality Scene Tokenization for Improved Motion Prediction in Autonomous Driving
Combining symbolic perception outputs with learned multi-modal scene tokens can significantly improve the accuracy and robustness of motion prediction models for autonomous driving.