Core Concepts
An AI-powered scoring system that outperforms human raters in objectively and efficiently assessing memory deficits using the Rey-Osterrieth Complex Figure Test.
Abstract
The study presents the development and evaluation of an AI-based scoring system for the Rey-Osterrieth Complex Figure (ROCF) test, a widely used neuropsychological assessment tool to evaluate non-verbal visual memory.
Key highlights:
- The researchers collected a large dataset of over 20,000 hand-drawn ROCF images from diverse populations, including healthy individuals and those with neurological/psychiatric disorders.
- To obtain unbiased ground truth scores, the researchers leveraged crowdsourced human intelligence, where multiple raters scored each ROCF drawing. This helped mitigate the subjectivity and inconsistency inherent in clinician-based scoring.
- The researchers developed a multi-head convolutional neural network that combines regression and multilabel classification approaches to automatically score the ROCF drawings. This model outperformed both amateur raters and clinicians in terms of accuracy, objectivity, and efficiency.
- The automated scoring system was found to be highly robust against common real-world variations, such as rotations, perspective changes, and changes in brightness/contrast of the drawings.
- The automated scoring system provides detailed, explainable scores for individual figure elements, facilitating interpretation and communication of the results.
- The findings demonstrate the potential of AI-powered tools to enhance the quality and efficiency of neuropsychological assessments, reducing reliance on subjective human ratings.
Stats
"Our estimation revealed that a single neuropsychological division (e.g. at the University Hospital Zurich) scores up to 6000 ROCF drawing per year."
"The average human MSE over all images is 16.3, and the average human MAE is 2.41."
"The clinician MSE over all images is 9.25 and the clinician MAE is 2.15, indicating a better performance of the clinicians compared to the average human rater."
"The model performs highly unbiased as it yielded predictions very close to the ground truth and the error was similarly distributed around zero."
"The final model results in a MSE of 3.00 and a MAE of 1.11, outperforming both amateur raters and clinicians."
Quotes
"An automated system that offers reliable, objective, robust and standardized scoring, while saving clinicians' time, would be desirable from an economic perspective and more importantly leads to more accurate scoring and subsequently diagnosing."
"Importantly, the model does not demonstrate any considerable bias towards specific figure elements. In contrast to the clinicians, the MAE is very balanced across each individual item of the figure."
"Our innovative approach that combines the digitization of neuropsychological tests and the high-quality scoring using crowdsourced intelligence can provide a roadmap for how AI can be leveraged in neuropsychological testing more generally as it can be easily adapted and applied to various other neuropsychological tests."