Constructing a Comprehensive Dataset for Singing Style Captioning
The authors introduce S2Cap, a novel dataset for the task of singing style captioning, which aims to generate textual descriptions of the vocal and musical characteristics of singing voices. The dataset contains a diverse set of attributes, including pitch, volume, tempo, mood, singer's gender and age, and musical genre and emotional expression.