Core Concepts
The Segment Anything Model (SAM) can effectively segment the pupil and iris in eye images captured in virtual reality setups, but struggles to accurately segment the sclera without extensive manual guidance.
Abstract
This study evaluates the performance of the Segment Anything Model (SAM) in segmenting key eye features - the pupil, iris, and sclera - from eye images captured in virtual reality (VR) setups.
The authors utilized two publicly available datasets, OpenEDS2019 and OpenEDS2020, which contain eye images recorded using VR head-mounted displays. They assessed SAM's zero-shot segmentation capabilities using various prompting strategies, including automatic "everything" mode, single/multiple point prompts, bounding box prompts, and combinations thereof.
The results show that SAM performs exceptionally well in segmenting the pupil, achieving an Intersection over Union (IoU) score of over 93% in both datasets. For the iris, SAM also demonstrated strong performance, reaching an IoU of over 86% when using a combination of bounding box and point prompts.
However, SAM struggled to accurately segment the sclera, with its best IoU score being only around 62%. The authors attribute this to the sclera's low contrast and blurred edges with the iris, as well as non-uniform illumination and shadows from the eyelids, which pose significant challenges for the model.
Overall, the study suggests that SAM has promising zero-shot capabilities for segmenting certain eye features, particularly the pupil, which is a crucial component in many gaze estimation pipelines. The authors recommend further research to explore fine-tuning SAM on eye image datasets or developing a bespoke foundation model tailored for eye tracking applications.
Stats
The pupil segmentation achieved an IoU of 93.34% in the OpenEDS2020 dataset using the BBOXP4 prompting strategy.
The iris segmentation achieved an IoU of 86.63% in the OpenEDS2019 dataset using the BBOXP1-1 prompting strategy.
The sclera segmentation achieved an IoU of 62.19% in the OpenEDS2019 dataset using the BBOXP4-4 prompting strategy.
Quotes
"SAM exhibits strong capabilities in pupil segmentation which can be used in pupil detection pipelines, a crucial element in numerous gaze-tracking frameworks."
"SAM struggles to recognize the sclera as a distinct object and requires extensive guidance from an annotator through the use of manual prompts."