Główne pojęcia
ELM introduces a comprehensive framework for agents to understand driving scenes with large spatial and temporal spans, surpassing previous approaches in various applications.
Streszczenie
ELM, an Embodied Language Model, revolutionizes autonomous agents' understanding of driving scenarios by incorporating space-aware pre-training and time-aware token selection. The model outperforms state-of-the-art methods in tasks such as Tracking, Box Detection, and Traffic Sign Inquiry. By leveraging diverse data sources and extensive pre-training, ELM demonstrates superior performance across various evaluation metrics.
Statystyki
Location: [3, 12, 0], Car
Location: [-1, 15, 0], Pedestrian
The ego vehicle has seen 1 go_straight before.
The ego vehicle has seen 1 go_straight and 1 turn_right before.
Many cars are parked and moving.
He turns the steering wheel to cross the intersection ahead.
Slow down to Keep a safe distance.
Cytaty
"The scene is a road with a curvy, winding path, surrounded by trees and hills."
"The ego vehicle should follow the traffic light's instructions and wait for the light to turn green before proceeding."
"The ego vehicle should continue driving through the intersection, following the traffic light’s instructions."