Core Concepts
MindSet: Vision is a toolbox that provides a large, configurable set of image datasets and scripts to test deep neural networks on a wide range of well-replicated visual experiments and phenomena reported in psychology. This enables systematic evaluation of how well DNNs capture key aspects of human visual perception.
Abstract
MindSet: Vision is a toolbox aimed at facilitating the testing of deep neural networks (DNNs) on visual psychological phenomena. It provides a large, easily accessible, and highly configurable set of 30 image datasets covering a wide array of well-replicated visual experiments and phenomena reported in psychology.
The datasets span low-level vision (e.g., Weber's law), mid-level vision (e.g., Gestalt effects), visual illusions, and object recognition tasks. Each dataset can be easily regenerated with different configurations (image size, background color, stroke color, number of samples, etc.), offering great versatility for different research contexts.
To enable experimentation, the toolbox provides scripts for three testing methods: Similarity Judgment Analysis, Decoder Approach, and Out-of-Distribution classification. These methods allow researchers to systematically evaluate how well DNNs capture key aspects of human visual perception, going beyond the typical observational benchmarks.
The authors provide examples illustrating the use of these methods with a classic feed-forward CNN (ResNet-152), and the code is extensively documented to facilitate adoption and extension by the research community.
By bridging the gap between computational modeling and psychological research, MindSet: Vision aims to drive further interest in testing DNN models against key experiments reported in psychology, in order to better characterize DNN-human alignment and build better DNN models of human vision.
Stats
Multiple benchmarks have been developed to assess the alignment between deep neural networks (DNNs) and human vision.
Deep neural networks (DNNs) provide the best solution for visual identification of objects short of biological vision, and many researchers claim that DNNs are the best current models of human vision and object recognition.
Key evidence in support of this claim comes from the finding that DNNs perform the best on various behavioural and brain benchmarks.
Quotes
"Multiple benchmarks have been developed to assess the alignment between deep neural networks (DNNs) and human vision."
"Deep neural networks (DNNs) provide the best solution for visual identification of objects short of biological vision, and many researchers claim that DNNs are the best current models of human vision and object recognition."