Core Concepts
GraphVL is a novel framework that leverages the semantic richness of vision-language models, particularly CLIP, to improve generalized class discovery by reducing bias towards known classes and enhancing discriminative feature learning through a graph convolutional network and metric learning.
Stats
K-means with pre-trained CLIP features outperforms ViT by around 10% on CIFAR-100.
GraphVL outperforms baselines by at least 1.4% on CIFAR-10, 2.2% on CIFAR-100, and 0.7% on ImageNet-100.
GraphVL achieves performance gains exceeding baselines by at least 10.9% in CUB-200, 11.4% in StanfordCars, and 3.5% in Aircraft datasets on new classes.
GraphVL surpasses existing methods on the iNaturalist dataset with a margin of 2.5% across all categories, 3.1% for known classes, and 1.9% for new classes.
GraphVL reduces parameter count by about 29 times compared to GCD and surpasses PromptCAL’s parameter tally by 0.93 times.