toplogo
Sign In

Limitations of Public Datasets for Modeling the Impact of Driving Tasks and Context on Drivers' Visual Attention


Core Concepts
Public datasets for training and evaluating models of drivers' visual attention have significant limitations in capturing the effects of driving tasks and context on attention allocation.
Abstract
The paper examines four large-scale public datasets (DR(eye)VE, BDD-A, MAAD, and LBW) used for training and evaluating algorithms for predicting drivers' gaze. It identifies several key limitations in the data collection and processing pipelines that prevent these datasets from effectively capturing the top-down influences of driving tasks and context on attention allocation. The key limitations include: Spatial and temporal constraints of the video data, which do not fully represent the driver's field of view and miss important events due to discontinuous recordings. Sparse and noisy vehicle telemetry data, which limits the ability to infer the driver's actions and intentions. Differences between on-road and in-lab eye-tracking data, where the latter lacks the full context and engagement of the driving task. Issues in processing the eye-tracking data, such as noisy gaze-to-scene mapping, inclusion of blinks and saccades, and differences in ground truth saliency map generation across datasets. The paper then uses new annotations for driving tasks (longitudinal and lateral maneuvers) and context (intersection types and right-of-way) to analyze the contents of the datasets. It finds that the datasets are dominated by simple scenarios where the driver maintains speed or lane, or the vehicle is stopped. More complex scenarios involving turns, lane changes, and yielding are underrepresented. Evaluating the performance of state-of-the-art gaze prediction models on the annotated subsets of the data reveals that the models struggle to capture the effects of driving tasks and context. The paper links this performance drop to the identified data limitations, arguing that simply increasing the proportion of non-trivial scenarios in the data is unlikely to be sufficient without addressing the underlying issues in data collection and processing.
Stats
"Driving is a visuomotor task where perception, especially vision [1], and motor actions are tightly intertwined [2], [3], [4]." "Despite ample experimental evidence that driving task and context affect how drivers observe the traffic scene [5], [6], [7], many of the state-of-the-art (SOTA) models are bottom-up [8]." "In DR(eye)VE, a camera is installed on the rooftop of the vehicle. Due to being at a higher vantage point, the recording often does not match what the driver was seeing, which causes issues with data processing (see Section III-D)." "In LBW videos, about 6K frames (5% of all data) are overexposed and two nighttime videos (comprising ≈2K frames) are underexposed, making it difficult to discern road markings, signs, signals, and vehicles." "DR(eye)VE and BDD-A provide only a small set of vehicle data, namely, GPS coordinates, heading, and speed, most of which are coarsely sampled at 1Hz." "LBW does not provide any vehicle information." "On-road setup is the most ecologically valid, since the driver controls the car and their decisions have consequences. At the same time, rides cannot be replicated across multiple participants and only mobile or remote eye-trackers can be used to allow for free head and body movement. As a result, recorded gaze data is sparse and has lower precision and sampling rate than what can be achieved in the lab."
Quotes
"Driving is a visuomotor task where perception, especially vision [1], and motor actions are tightly intertwined [2], [3], [4]." "Despite ample experimental evidence that driving task and context affect how drivers observe the traffic scene [5], [6], [7], many of the state-of-the-art (SOTA) models are bottom-up [8]." "In DR(eye)VE, a camera is installed on the rooftop of the vehicle. Due to being at a higher vantage point, the recording often does not match what the driver was seeing, which causes issues with data processing (see Section III-D)."

Deeper Inquiries

How can the identified limitations in the public datasets be addressed through new data collection efforts that better capture the top-down influences on drivers' attention?

The limitations identified in the public datasets, such as lack of annotations for top-down influences on drivers' attention, sparse gaze recordings near intersections, and data loss around lateral maneuvers, can be addressed through new data collection efforts that focus on capturing the full spectrum of driving tasks and contexts. Here are some strategies to improve data collection: Longer and Continuous Recordings: Collecting longer and continuous video recordings that span several minutes can provide a more comprehensive view of drivers' behavior and attention allocation. This will help capture a wider range of driving scenarios, including interactions at intersections and complex maneuvers. Multiple Camera Angles: Using multiple synchronized cameras with a wide field of view around the vehicle can help capture what the driver sees more accurately. This can include cameras mounted inside the vehicle to capture in-cabin views and external cameras to cover blind spots and side mirrors. Accurate Telemetry Data: Ensure accurate telemetry data collection, including GPS coordinates, speed, IMU sensor output, and steering wheel rotations. This data is crucial for understanding drivers' actions and intentions, especially during maneuvers like lane changes and turns. Improved Gaze Tracking: Enhance on-road gaze recordings by accurately matching driver's gaze to the scene view using camera calibration information. This can involve refining the homography transformation process to reduce errors, especially during head movements and challenging lighting conditions. Contextual Annotations: Annotate the data with detailed information about driving tasks, such as lateral and longitudinal maneuvers, intersections types, and driver's priority at intersections (right-of-way or yielding). This will provide a richer context for modeling top-down effects on drivers' attention. Open Data Access: Make raw data available for re-analysis and reproducibility. Providing access to the original data will enable researchers to validate findings, improve models, and address any limitations in the existing datasets. By implementing these strategies in new data collection efforts, researchers can create more comprehensive and representative datasets that better capture the top-down influences on drivers' attention.

How might the insights from this analysis of driver attention datasets inform the design of future intelligent transportation systems that need to understand and predict human visual attention in complex driving scenarios?

The insights gained from analyzing driver attention datasets can significantly impact the design of future intelligent transportation systems by enhancing their ability to understand and predict human visual attention in complex driving scenarios. Here are some ways in which these insights can inform the development of such systems: Improved Driver Assistance Systems: By understanding how drivers allocate their attention in different driving tasks and contexts, intelligent transportation systems can provide more tailored and effective assistance. For example, systems can alert drivers to potential hazards or provide guidance based on their current focus of attention. Enhanced Safety Features: Insights into drivers' gaze patterns and behaviors can help in the design of advanced safety features, such as collision avoidance systems and adaptive cruise control. These systems can adapt to drivers' attention levels and intervene when necessary to prevent accidents. Optimized Human-Machine Interaction: Understanding how drivers interact with in-vehicle technology and external stimuli can lead to better design of interfaces and communication methods. Future systems can prioritize information based on drivers' attentional needs and reduce cognitive load during complex driving tasks. Predictive Analytics: By modeling the impact of driving tasks and context on attention allocation, intelligent transportation systems can predict driver behavior in real-time. This predictive capability can be used to anticipate potential risks and optimize driving strategies for improved safety and efficiency. Training and Education: Insights from driver attention datasets can also inform driver training programs and educational initiatives. By highlighting common attentional challenges and behaviors, training modules can be tailored to address specific areas where drivers may need additional support or guidance. Overall, leveraging the insights from driver attention datasets can lead to the development of more intelligent, adaptive, and human-centric transportation systems that prioritize safety, efficiency, and driver well-being in complex driving environments.

What alternative approaches, beyond relying on public datasets, could be used to model the impact of driving tasks and context on attention allocation?

In addition to public datasets, there are alternative approaches that can be utilized to model the impact of driving tasks and context on attention allocation. These approaches can provide complementary insights and enhance the understanding of drivers' visual attention in various driving scenarios. Here are some alternative methods: Simulator Studies: Conducting driving simulator studies allows researchers to create controlled and repeatable driving scenarios with varying levels of complexity. Simulators can simulate different driving tasks, contexts, and environmental conditions to study drivers' attention allocation in a safe and controlled environment. Naturalistic Driving Studies: Naturalistic driving studies involve collecting data from real-world driving experiences using instrumented vehicles. This approach provides authentic driving behavior data in diverse and uncontrolled driving conditions, offering insights into how drivers allocate attention in everyday situations. Eye-Tracking Experiments: Conducting eye-tracking experiments in controlled laboratory settings can help isolate specific factors influencing attention allocation. By presenting drivers with stimuli related to different driving tasks and contexts, researchers can analyze gaze patterns and attentional shifts in a controlled environment. Behavioral Observations: Observing drivers in real-world settings without interference or manipulation can provide valuable insights into natural attention allocation during driving. Behavioral observations can capture drivers' responses to dynamic traffic situations and interactions with other road users. Neuroimaging Studies: Neuroimaging techniques, such as fMRI and EEG, can be used to study the neural mechanisms underlying attention allocation during driving tasks. These studies can reveal brain regions involved in processing task-relevant information and guide the development of attention models. Crowdsourcing and Citizen Science: Engaging drivers as citizen scientists through crowdsourcing platforms can help collect large-scale driving behavior data. By incentivizing drivers to share their driving experiences and gaze patterns, researchers can gather diverse datasets for modeling attention allocation. By integrating these alternative approaches with public datasets, researchers can gain a more comprehensive understanding of the impact of driving tasks and context on attention allocation. Each method offers unique advantages and insights that contribute to a holistic view of drivers' visual attention in complex driving scenarios.
0