toplogo
سجل دخولك

Detection of Micromobility Vehicles in Urban Traffic Videos: A Novel Approach Combining Single-Frame and Video Object Detection


المفاهيم الأساسية
The authors introduce a novel detection model that combines single-frame precision with video object detection capabilities to accurately identify micromobility vehicles in urban environments.
الملخص
The content discusses the challenges of detecting micromobility vehicles in urban traffic scenarios and introduces a new detection model, FGFA-YOLOX, that leverages both single-frame and video-based object detection methodologies. The model aims to enhance detection consistency and accuracy by incorporating spatio-temporal information. It was tested on a custom dataset curated for micromobility scenarios, showcasing substantial improvement over existing state-of-the-art methods. The paper also compares the performance of the proposed model with other SOTA object detectors, highlighting its superior performance in terms of mAP and mAP@50 metrics. Additionally, the content provides insights into the experimental setup, evaluation metrics, implementation details, and model training process.
الإحصائيات
Our proposed FGFA-YOLOX model achieves an mAP of 38.6%. Inference time per frame for FGFA-RFCN is 418.1 milliseconds. YOLOv8 has an mAP@50 score of 64.2%.
اقتباسات
"Our approach enhances detection in challenging conditions, including occlusions." "Our method can benefit from the large number of readily available pre-trained models for YOLOX." "The combined strengths of our model offer a significant advancement in addressing the diverse challenges of urban traffic conditions."

الرؤى الأساسية المستخلصة من

by Khal... في arxiv.org 02-29-2024

https://arxiv.org/pdf/2402.18503.pdf
Detection of Micromobility Vehicles in Urban Traffic Videos

استفسارات أعمق

How can attention mechanisms be integrated into the model to improve occlusion handling

To improve occlusion handling in the model, attention mechanisms can be integrated to enhance feature selection and focus on relevant information. By incorporating mechanisms like self-attention or spatial attention, the model can learn to prioritize important features while suppressing irrelevant ones. This selective focus enables the model to better discern objects even when they are partially obscured by occlusions. Attention mechanisms can guide the model to pay more attention to critical regions within an image where micromobility vehicles might be present, aiding in accurate detection despite challenging conditions.

What are the implications of using several frames for detecting micromobility vehicles on inference times

Using several frames for detecting micromobility vehicles has implications on inference times due to the increased computational load of processing multiple frames simultaneously. While leveraging temporal information from neighboring frames enhances detection accuracy and robustness, it also requires additional computations for aligning features across frames and aggregating them effectively. This process leads to longer inference times compared to single-frame models that analyze each frame independently. Therefore, there is a trade-off between improved performance through multi-frame analysis and increased computational complexity resulting in slower inference speeds.

How can diversifying the training dataset further enhance recognition accuracy across different urban scenarios

Diversifying the training dataset plays a crucial role in enhancing recognition accuracy across different urban scenarios by exposing the model to a wider range of variations and complexities present in real-world environments. A diverse dataset helps the model generalize better by learning from various perspectives, lighting conditions, object sizes, and occlusion patterns commonly encountered in urban settings. By including diverse instances of micromobility vehicles captured under different circumstances, such as varying weather conditions or traffic densities, the model becomes more adept at recognizing these vehicles accurately regardless of environmental factors. This exposure ensures that the model learns robust features that are applicable across a spectrum of urban scenarios, ultimately improving its performance and reliability during deployment.
0
visual_icon
generate_icon
translate_icon
scholar_search_icon
star