Основні поняття
A customized musical word embedding that incorporates both general and music-specific vocabulary can improve the performance of audio-word joint representation for music tagging and retrieval tasks.
Анотація
The paper presents a novel approach called Musical Word Embedding (MWE) that learns word embeddings from a combination of general and music-specific text corpora. The authors integrate the MWE into an audio-word joint representation framework for music tagging and retrieval tasks.
Key highlights:
- The authors train word embeddings using different combinations of general (e.g., Wikipedia) and music-specific (e.g., music reviews, tags, artist/track IDs) text corpora to investigate their effect on music-related tasks.
- Experiments show that using a more specific musical word like "track" results in better retrieval performance, while using a less specific term like "tag" leads to better tagging performance.
- To balance this compromise, the authors suggest multi-prototype training that uses words with different levels of musical specificity jointly.
- The proposed MWE-based audio-word joint embedding outperforms previous approaches based on general word embeddings on both seen and unseen tag datasets for music tagging and retrieval tasks.
- Qualitative analysis through visualization demonstrates that the MWE better captures musical context and semantics compared to general word embeddings.
Статистика
"Over 100 million songs in Spotify's catalog"
"9.8M unique words in the general corpus (Wikipedia 2020)"
"705,498 unique words in the music corpus (reviews, tags, IDs)"
Цитати
"Word embedding has become an essential means for text-based information retrieval. Typically, word embeddings are learned from large quantities of general and unstructured text data. However, in the domain of music, the word embedding may have difficulty understanding musical contexts or recognizing music-related entities like artists and tracks."
"To address this issue, we propose a new approach called Musical Word Embedding (MWE), which involves learning from various types of texts, including both everyday and music-related vocabulary."