The author argues that vision-language models can improve skin lesion diagnosis by using concept-based descriptions as textual embeddings, reducing the need for concept-annotated datasets.