Grunnleggende konsepter
Combining lightweight object detection with Large Language Models enhances safety in visual navigation for the visually impaired.
Sammendrag
This paper explores the use of Large Language Models (LLMs) in zero-shot anomaly detection for safe visual navigation. The proposed framework leverages real-time object detection and specialized prompts to identify anomalies, provide audio descriptions, and assist in safe navigation. It addresses challenges in dynamic urban environments and emphasizes the importance of vision-language understanding for safety concerns.
Abstract:
- Explores potential of LLMs in zero-shot anomaly detection.
- Utilizes real-time open-world object detection model Yolo-World.
- Emphasizes on safe visual navigation for visually impaired individuals.
Introduction:
- Discusses advancements in accessible technologies due to machine learning.
- Highlights impact of deep learning on object detection and segmentation models.
Methodology:
- Describes a multi-module architecture integrating object detection with LLM capabilities.
- Outlines the process of anomaly alerts and scene descriptions for users.
Experiments:
- Compares proposed system with rule-based anomaly detection.
- Evaluates system optimization and detection accuracy.
Conclusion:
- Demonstrates potential of combining lightweight object detection with LLMs for enhanced accessibility.
- Emphasizes prompt engineering's role in guiding LLM responses.
Statistikk
"Latency: As shown in Table 4, we measured end-to-end system latency and individual module processing times to identify bottlenecks and optimize for real-time performance. Results indicated an average end-to-end latency of 60 ms on the mobile device (e.g., smartphone) with neural engines, ensuring timely feedback."