Multi-Modal Proxy Learning for Personalized Visual Clustering
The proposed Multi-MaP method leverages multi-modal models and large language models to capture a user's specific interest and discover personalized clustering structures hidden in visual data.