Detailed Traffic Video Captioning for Vehicle and Pedestrian Safety Scenarios
TrafficVLM, a novel multi-modal dense video captioning model, can precisely localize and describe incidents within continuous traffic video streams, providing detailed descriptions of vehicle and pedestrian behavior and context.