Core Concepts
Advancing in-vehicle gaze estimation with a novel dataset and algorithm.
Abstract
This content discusses the importance of driver's eye gaze for intelligent vehicles and introduces a new dataset, IVGaze, capturing in-vehicle gaze. It presents a vision-based solution for in-vehicle gaze collection and explores a dual-stream gaze pyramid transformer for accurate estimation. The content also delves into a strategy for gaze zone classification, showcasing the effectiveness of the proposed methods.
Directory:
- Introduction to Driver's Eye Gaze Importance
- Driver intention understanding is crucial for intelligent vehicles.
- Challenges in In-Vehicle Gaze Estimation Research
- Limited datasets due to confined vehicular environment.
- Comprehensive Vision-Based In-Vehicle Gaze Estimation Research
- Introducing IVGaze dataset with diverse conditions.
- Novel Approach: Dual-Stream Gaze Pyramid Transformer (GazeDPTR)
- State-of-the-art performance on IVGaze dataset.
- Strategy for Gaze Zone Classification Extension
- Defining foundational tri-plane and projecting gaze onto it.
- Experiment Results Comparison with SOTA Methods
- Improved performance of proposed methods over existing ones.
- Impact of Face Accessories on Performance
- Glasses have less impact compared to sunglasses and masks.
- Ablation Study Results
- Multi-level features integration enhances performance.
- Additional Experiments on Normalized and Original Images
- Combination improves performance across different head pose ranges.
Stats
Despite its significance, research on in-vehicle gaze estimation remains limited due to the scarcity of comprehensive datasets in real driving scenarios.
IVGaze dataset collected from 125 subjects covers various conditions like diverse head poses, eye movements, illumination variations, and face accessories presence.
The dual-stream gaze pyramid transformer (GazeDPTR) shows state-of-the-art performance on the IVGaze dataset.
Quotes
"Driver’s eye gaze holds a wealth of cognitive and intentional cues crucial for intelligent vehicles."
"Our work brings two deep insights: multi-level feature is useful to capture eye region information; simultaneously leveraging original images and normalized images could achieve better performance."