Multi-modal Learnable Queries for Enhancing Image Aesthetics Assessment
The proposed multi-modal learnable queries (MMLQ) method efficiently extracts multi-modal aesthetic features from input images and their associated user comments using frozen pre-trained visual and textual encoders, achieving new state-of-the-art performance on image aesthetics assessment.