Core Concepts
Deep learning has revolutionized appearance-based gaze estimation, enabling more accurate and robust gaze estimation from simple camera setups. However, the unique challenges in gaze estimation research, such as the unfair comparison between 2D gaze positions and 3D gaze vectors and the different pre-processing and post-processing methods, have led to a lack of definitive guidelines for developing deep learning-based gaze estimation algorithms.
Abstract
This paper presents a systematic review of appearance-based gaze estimation methods using deep learning. It first surveys the existing gaze estimation algorithms along the typical gaze estimation pipeline: deep feature extraction, deep learning model design, personal calibration, and platforms.
For deep feature extraction, the paper discusses methods that extract features from eye images, face images, and videos. It covers techniques like using attention mechanisms to fuse features from two eyes, extracting subject-invariant features, and leveraging generative adversarial networks to handle specific environmental factors.
In terms of deep neural network architecture design, the paper reviews supervised CNNs, semi-/self-/un-supervised CNNs, multi-task CNNs, recurrent CNNs, and CNNs that integrate prior knowledge like eye anatomy and eye movement patterns.
For personal calibration, the paper discusses methods that use domain adaptation and user-unaware data collection to improve performance on new subjects or environments.
Finally, the paper covers the use of different camera types (RGB, IR, depth) and platforms (computers, mobile devices, head-mounted displays) for gaze estimation.
To enable fair comparisons between different gaze estimation methods, the paper also summarizes data pre-processing and post-processing methods, including face/eye detection, data rectification, 2D/3D gaze conversion, and gaze origin conversion.
Overall, this paper serves as a comprehensive reference for developing deep learning-based gaze estimation methods and a guideline for future gaze estimation research.
Stats
"Gaze estimation is crucial for various applications, such as extended reality (XR) devices, human-computer interaction, and attention analysis."
"Deep learning-based methods offer several advantages over conventional appearance-based methods, including more accurate and robust gaze estimation, reduced need for personal calibration, and expanded application range."
"Appearance-based gaze estimation suffers from many challenges, including head motion and subject differences, particularly in unconstrained environments."
Quotes
"Deep learning has revolutionized appearance-based gaze estimation, enabling more accurate and robust gaze estimation from simple camera setups."
"The unique challenges in gaze estimation research, such as the unfair comparison between 2D gaze positions and 3D gaze vectors and the different pre-processing and post-processing methods, have led to a lack of definitive guidelines for developing deep learning-based gaze estimation algorithms."
"This paper serves as a comprehensive reference for developing deep learning-based gaze estimation methods and a guideline for future gaze estimation research."