Core Concepts
ChatGPT's performance in diagnosing melanoma from dermatoscopic images is significantly below that of approved AI algorithms, posing high risks for missing melanomas or incorrectly classifying benign lesions as malignant.
Abstract
The content discusses a study that explored the performance of ChatGPT Vision in diagnosing melanoma from dermatoscopic images. Key points:
- In September 2023, a new feature was added to ChatGPT that allows analysis of images, including those obtained through dermatoscopy.
- The study used 100 melanocytic lesions, including 50 melanomas and 50 benign nevi, from the International Skin Imaging Collaboration archives.
- Considering the first diagnosis provided by ChatGPT Vision, the sensitivity was 32%, specificity was 40%, and overall diagnostic accuracy was 36%.
- The correct diagnosis was in the top three of the differential diagnoses with a sensitivity of 56%, specificity of 53%, and precision of 55%.
- Regarding ChatGPT's ability to differentiate between malignant and benign lesions, the sensitivity, specificity, and precision were 46%, 78%, and 62%, respectively, for the diagnosis deemed most likely and 78%, 47%, and 62% for the top three diagnoses.
- The study is limited by the small number of lesions used, the absence of dysplastic lesions, and the lack of consideration of essential data such as anatomical site, nevus type, and melanoma thickness.
- The results show that ChatGPT Vision's performance is significantly below that of approved AI algorithms for diagnosing melanoma, posing high risks for clinical use.
Stats
The sensitivity, specificity, and overall diagnostic accuracy of ChatGPT Vision in diagnosing melanoma from dermatoscopic images were 32%, 40%, and 36%, respectively, considering the first diagnosis provided.
The correct diagnosis was in the top three of the differential diagnoses with a sensitivity of 56%, specificity of 53%, and precision of 55%.
Regarding ChatGPT's ability to differentiate between malignant and benign lesions, the sensitivity, specificity, and precision were 46%, 78%, and 62%, respectively, for the diagnosis deemed most likely and 78%, 47%, and 62% for the top three diagnoses.
Quotes
"The risk for missing a melanoma and incorrectly classifying a lesion as malignant is too high to use ChatGPT in clinical practice."
"ChatGPT is not sufficiently effective for the diagnosis of melanoma, even if it can at least help describe images. Or should we say not yet sufficiently effective?"